nsivabalan commented on issue #3077:
URL: https://github.com/apache/hudi/issues/3077#issuecomment-1245994450

   sorry we dropped the ball on this. lets try to make some progress. 
   I re-read entire thread. 
   
   here are my thoughts:
   - what kind of write you are using to write to hudi? is it spark datasource 
write or deltastreamer writes or spark structured streaming? 
   - probably MOR is not the right approach since you don't seem to have any 
updates(only 1%). 
   - in COW, we can disable small file handling, but enable clustering to batch 
small files if any at regular cadence. 
   - also, can we try increasing file size to may be 250Mb. and see how it 
compares w/ what you already see. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to