lw309637554 commented on pull request #2379: URL: https://github.com/apache/hudi/pull/2379#issuecomment-751763119
@n3nash @satishkotha @leesf can you help to review again? thanks the pr resolved three issue: 1. fix two bugs in async clustering scenario a. deltastreamer DeltaSync.java not support readFromSource of a replace commit, just filter it b. in AbstractHoodieWriteClient. rollbackPendingCommits() should not rollback replace commit , just filter it 2. add HoodieClusteringJob.java for async clustering job. User can use it like this a. schedule clustering bin/spark-submit --master local[4] --class org.apache.hudi.utilities.HoodieClusteringJob /Users/liwei/work-space/dla/opensource/incubator-hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.6.1-SNAPSHOT.jar --schedule --base-path /Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups2/dest --table-name hudi_table_with_small_filegroups2 --instant-time 20201227161308 --schema-file /Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups2/config/schema.avsc --props /Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups2/config/clusteringjob.properties --spark-memory 1g b. cluster bin/spark-submit --master local[4] --class org.apache.hudi.utilities.HoodieClusteringJob /Users/liwei/work-space/dla/opensource/incubator-hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.6.1-SNAPSHOT.jar --base-path /Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups2/dest --table-name hudi_table_with_small_filegroups2 --instant-time 20201227161308 --schema-file /Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups2/config/schema.avsc --props /Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups2/config/clusteringjob.properties --spark-memory 1g ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
