vinothchandar commented on issue #2330: URL: https://github.com/apache/hudi/issues/2330#issuecomment-743266257
@borislitvak I actually think the comparison with what they suggest as a solution in iceberg is apples-oranges. Let me try and explain. The answer above pertains to writing data over s3 *without* any external catalog like Hive Metastore. Of course, if you use some kind of external server, you can grab some kind of lock. What about writes/reads that don't go thru Hive catalog and directly hit s3. Like I mentioned, without any atomic renames supported by s3 itself, its hard to do this. Finally you are comparing systems that support updates/deletes with one that does not. Hudi keeps the latest instant time tracked in Hive metastore as well. It will be easy for us to support a conflict check + conditional update, which will get us same semantics. But we have been mulling if we can do better and support a CRDT style model. I would appreciate it if you can at-least cross link the caveats in your iceberg post, since it can be misleading without this. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
