Hi, We are currently using the iceberg spark datasource for ingestion, using structure streaming on google cloud storage. This presents a couple challenges:
1) Implementing an atomic rename for gcs 2) Keep the FileAppenders open since we cannot sort within partitions on a streaming dataframe We have managed to do both, but the solution is ugly, as the current iceberg spark DataSource has private members where it could have protected members. Without going into details here, we wanted to get a feel for the likelihood of re-engineering the shape of DataSource, so StreamingPartitionWriter could be a custom implementation, as well as Table (we have GCSTable, GCSTableOperations, etc.) . We would love to PR in iceberg this... before we jump in, does this sound do-able? Thanks! dave sugden
