vinothchandar edited a comment on issue #2330: URL: https://github.com/apache/hudi/issues/2330#issuecomment-743059430
@borislitvak I don't think the s3 consistency announcement changes this. Can't speak for Delta. but IMO no system can provide these guarantees above, without any atomic guarantees from the underlying storage. Approach we have taken in Hudi, is to build common cases needing concurrent writing - removing older files (Hudi cleaning) - merging updates with a base file (Hudi Compaction) - coalescing a bunch of small files into a large ones (Hudi Clustering, upcoming in 0.7.0 release) have been built into the concurrency model. These things can happen concurrently with a single writer that is writing new data into the table and is supported on top of s3 or other storages. if you truly need the support for writing "data" concurrently, then @n3nash is looking into a different design for concurrent writing in Hudi, that is aimed at making it work on storage like S3 as well. So understanding your use-case would be very valuable here. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
