bhasudha commented on code in PR #9372: URL: https://github.com/apache/hudi/pull/9372#discussion_r1307296531
########## website/docs/concurrency_control.md: ########## @@ -186,18 +221,32 @@ A Hudi Streamer job can then be triggered as follows: --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \ --source-ordering-field impresssiontime \ --target-base-path file:\/\/\/tmp/hudi-streamer-op \ - --target-table uber.impressions \ + --target-table taableName \ --op BULK_INSERT ``` +## Early conflict Detection + +Multi writing using OCC allows multiple writers to concurrently write and atomically commit to the Hudi table if there is no overlapping data file to be written, to guarantee data consistency, integrity and correctness. Prior to the 0.13.0 release, such conflict detection of overlapping data files is performed before commit metadata and after the data writing is completed. If any conflict is detected in the final stage, it could have wasted compute resources because the data writing is finished already. Review Comment: Thanks. Taking it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
