Hi Dave, I'm sure we can get this working, but I'd like to understand what you're trying to do a bit better.
Why do you need atomic rename? Iceberg is set up to write data in place and not move or rename files. Committing those files to a table is an atomic operation instead. Everything should work with GCS without modification as far as I know, unless you don't want to use the Hadoop FileSystem APIs. Keeping file appenders open using a write property or a table property sounds like a good idea to me. I wouldn't want this to be the default for batch writes, but I think it may make sense as an option for streaming writes. I'd prefer to add these features to the existing streaming writer instead of allowing users to use their own custom writer. Are there other reasons to replace the writer instead of making this behavior configurable? As for having your own Table implementation, that's already possible but probably not what you want to do. The built-in Table implementation delegates everything to TableOperations and you can plug in your own TableOperations (see the new guide <http://iceberg.apache.org/custom-catalog/>). But like I said above, I don't think you will need to do this to work with GCS. rb On Fri, Sep 13, 2019 at 6:45 AM Dave Sugden <[email protected]> wrote: > Hi, > > We are currently using the iceberg spark datasource for ingestion, using > structure streaming on google cloud storage. This presents a couple > challenges: > > 1) Implementing an atomic rename for gcs > 2) Keep the FileAppenders open since we cannot sort within partitions on a > streaming dataframe > > We have managed to do both, but the solution is ugly, as the current > iceberg spark DataSource has private members where it could have protected > members. > > Without going into details here, we wanted to get a feel for the > likelihood of re-engineering the shape of DataSource, so > StreamingPartitionWriter could be a custom implementation, as well as Table > (we have GCSTable, GCSTableOperations, etc.) . > > We would love to PR in iceberg this... before we jump in, does this sound > do-able? > > Thanks! > > dave sugden > > > > -- Ryan Blue Software Engineer Netflix
