Hi,

We are currently using the iceberg spark datasource for ingestion, using
structure streaming on google cloud storage. This presents a couple
challenges:

1) Implementing an atomic rename for gcs
2) Keep the FileAppenders open since we cannot sort within partitions on a
streaming dataframe

We have managed to do both, but the solution is ugly, as the current
iceberg spark DataSource has private members where it could have protected
members.

Without going into details here, we wanted to get a feel for the likelihood
of re-engineering the shape of DataSource, so StreamingPartitionWriter
could be a custom implementation, as well as Table (we have GCSTable,
GCSTableOperations, etc.) .

We would love to PR in iceberg this... before we jump in, does this sound
do-able?

Thanks!

dave sugden

Reply via email to