Re: [I] [SUPPORT]hudi insert is too slow [hudi]

via GitHub Mon, 20 Nov 2023 02:07:01 -0800


zyclove commented on issue #10131:
URL: https://github.com/apache/hudi/issues/10131#issuecomment-1818731721


   @ad1happy2go 
   Can bulk mode not generate small files? Directly output the 128M result file 
and merge it later.
   If hoodie.clustering is turned on, can small files be automatically merged 
after the bulk is completed? 
   Must I start the follow job to do the merge?
   ```
   hoodie.clustering.inline=true
   
   spark-submit \
   --master yarn \
   --class org.apache.hudi.utilities.HoodieClusteringJob \
   hdfs://nameservice1/utility_jars/hudi-utilities-bundle_2.12-0.10.0.jar
   ``` 
   -----------------------
   If not use bulk mode.
   Can this  stage（Building workload profile:smart_datapoint_report_rw_clear_rt
   ）be optimized in hudi 1.0? This stage is simply too time consuming.
   
![image](https://github.com/apache/hudi/assets/15028279/8e7cf46b-1691-4c58-817b-11dac0d950aa)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [SUPPORT]hudi insert is too slow [hudi]

Reply via email to