vinothchandar commented on issue #1728: URL: https://github.com/apache/hudi/issues/1728#issuecomment-646429295
I assume that’s the final write stage? That’s a large skew. My guess is that’s the insert partition. Can you paste driver logs from a single run? We will be interested in how the MergeOnReadUpsertPartitioner is dividing up the writes You can also turn on the hudi metrics to see how each of your commits are writing data. Also if this the main issue you see as of now? (Meaning others are resolved?). Hudi tries to pack records into fewer Files as possible (small files hurt query performance a lot). So if your goal here is to get this down to much smaller batches, at the cost of smaller files, we may also want to consider trade offs. If you help me understand your high level use case better, it will help me suggest the right thing accordingly ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
