lw309637554 commented on pull request #2379:
URL: https://github.com/apache/hudi/pull/2379#issuecomment-751763119


   @n3nash @satishkotha @leesf  can you help to review again? thanks
   the pr resolved three issue:
   1. fix two bugs in async clustering scenario
   a. deltastreamer DeltaSync.java not support readFromSource of a replace 
commit, just filter it
   b. in AbstractHoodieWriteClient. rollbackPendingCommits() should not 
rollback replace commit , just filter it 
   
   2. add HoodieClusteringJob.java  for async clustering job. User can use it 
like this
   a. schedule clustering
   bin/spark-submit
   --master local[4]
   --class org.apache.hudi.utilities.HoodieClusteringJob
   
/Users/liwei/work-space/dla/opensource/incubator-hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.6.1-SNAPSHOT.jar
   --schedule
   --base-path 
/Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups2/dest
   --table-name hudi_table_with_small_filegroups2
   --instant-time 20201227161308
   --schema-file 
/Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups2/config/schema.avsc
   --props 
/Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups2/config/clusteringjob.properties
   --spark-memory 1g
   b. cluster
   bin/spark-submit
   --master local[4]
   --class org.apache.hudi.utilities.HoodieClusteringJob
   
/Users/liwei/work-space/dla/opensource/incubator-hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.6.1-SNAPSHOT.jar
   --base-path 
/Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups2/dest
   --table-name hudi_table_with_small_filegroups2
   --instant-time 20201227161308
   --schema-file 
/Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups2/config/schema.avsc
   --props 
/Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups2/config/clusteringjob.properties
   --spark-memory 1g


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to