bhasudha commented on code in PR #9372: URL: https://github.com/apache/hudi/pull/9372#discussion_r1307298317
########## website/docs/concurrency_control.md: ########## @@ -186,18 +221,32 @@ A Hudi Streamer job can then be triggered as follows: --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \ --source-ordering-field impresssiontime \ --target-base-path file:\/\/\/tmp/hudi-streamer-op \ - --target-table uber.impressions \ + --target-table taableName \ --op BULK_INSERT ``` +## Early conflict Detection + +Multi writing using OCC allows multiple writers to concurrently write and atomically commit to the Hudi table if there is no overlapping data file to be written, to guarantee data consistency, integrity and correctness. Prior to the 0.13.0 release, such conflict detection of overlapping data files is performed before commit metadata and after the data writing is completed. If any conflict is detected in the final stage, it could have wasted compute resources because the data writing is finished already. + +To improve the concurrency control, the 0.13.0 release introduced a new feature, early conflict detection in OCC, to detect the conflict during the data writing phase and abort the writing early on once a conflict is detected, using Hudi's marker mechanism. Hudi can now stop a conflicting writer much earlier because of the early conflict detection and release computing resources necessary to cluster, improving resource utilization. + +By default, this feature is turned off. To try this out, a user needs to set `hoodie.write.concurrency.early.conflict.detection.enable` to true, when using OCC for concurrency control (Refer [configs](https://hudi.apache.org/docs/next/configurations#Write-Configurations-advanced-configs) page for all relevant configs). Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
