vinothchandar commented on issue #1728:
URL: https://github.com/apache/hudi/issues/1728#issuecomment-646429295


   I assume that’s the final write stage? That’s a large skew. My guess is 
that’s the insert partition. Can you paste driver logs from a single run? We 
will be interested in how the MergeOnReadUpsertPartitioner is dividing up the 
writes 
   
   You can also turn on the hudi metrics to see how each of your commits are 
writing data.
   
   Also if this the main issue you see as of now? (Meaning others are 
resolved?). Hudi tries to pack records into fewer Files as possible (small 
files hurt query performance a lot). So if your goal here is to get this down 
to much smaller batches, at the cost of smaller files, we may also want to 
consider trade offs. If you help me understand your high level use case better, 
it will help me suggest the right thing accordingly


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to