[GitHub] [hudi] harsh1231 edited a comment on issue #4745: [SUPPORT] Bulk Insert into COW table slow

GitBox Tue, 08 Feb 2022 20:43:10 -0800


harsh1231 edited a comment on issue #4745:
URL: https://github.com/apache/hudi/issues/4745#issuecomment-1033341751



   @harishraju-govindaraju  Can you check 
https://hudi.apache.org/docs/0.5.0/admin_guide/
   `stats filesizes - File Sizes. Display summary stats on sizes of files
   stats wa - Write Amplification. Ratio of how many records were upserted to 
how many records were actually written`
   
   Also can you share  stage level spark ui screen shots 
   Performance of upsert operation depends on how much underlying dataset 
overlaps with incoming dataset 
   Looking at job overall stats -> 792 tasks , check if there are small files 
created during initial load of data . 
   `hoodie.copyonwrite.record.size.estimate=100` set this during first load of 
data if you have large number of small files 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] harsh1231 edited a comment on issue #4745: [SUPPORT] Bulk Insert into COW table slow

Reply via email to