gallushi commented on issue #1655: URL: https://github.com/apache/iceberg/issues/1655#issuecomment-732855802
Hi @rdblue > can you describe what you're suggesting in a bit more detail? How would you use atomic write instead? Currently, when using hadoop tables, `HadoopTableOperations` assumes atomic rename guarantees, and (when committing) proceeds to rename the temp snapshot object from its temp name to the snapshot object name. However - atomic write can also be used - instead of using the temp object, we can directly (and atomically) write the snapshot object. the write will succeed iff the object did not exist. > And how would you detect whether to use atomic write or atomic rename? I think using a config on a scheme level makes sense (this way one can control on which FS to use atomic write and on which ones to stay with atomic rename) since atomic write/ atomic rename are FS level guarantees. > In general, I would not recommend using a file system for this guarantee. It's better to use a database transaction for the atomic update operation. That's why we want to have support for a variety of catalog plugins in addition to Hive, like JDBC, Nessie, and Glue. Yes; however - adding support for atomic write is a relatively low-hanging fruit, and storage systems such as IBM Cloud Object Storage can then be used even without an external catalog. while we discuss this, i'll open a PR with the changes i have in mind. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
