nsivabalan commented on issue #3077: URL: https://github.com/apache/hudi/issues/3077#issuecomment-1245994450
sorry we dropped the ball on this. lets try to make some progress. I re-read entire thread. here are my thoughts: - what kind of write you are using to write to hudi? is it spark datasource write or deltastreamer writes or spark structured streaming? - probably MOR is not the right approach since you don't seem to have any updates(only 1%). - in COW, we can disable small file handling, but enable clustering to batch small files if any at regular cadence. - also, can we try increasing file size to may be 250Mb. and see how it compares w/ what you already see. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
