srsteinmetz commented on issue #1737:
URL: https://github.com/apache/hudi/issues/1737#issuecomment-653619184


   When I was originally load testing this table I was sending almost 
exclusively inserts. According to this documentation it seems expected that 
inserts end up in new parquet files: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=135860485. 
When I changed my load generator to start sending updates I noticed that the 
parquet files were compressed as expected. Now with only updates being sent the 
.hoodie folder seems to show cleaning happening as expected.
   
   However, for our use-case some of our tables will be almost exclusively 
inserts, so I'm worried the current behavior will result in many parquet files 
and degrading performance. From reading this thread it seems like this behavior 
might be related to 
https://hudi.apache.org/docs/configurations.html#logFileToParquetCompressionRatio
 but from the description it's still not clear to me how this property should 
be configured to get the desired behavior.
   
   For some reason GitHub is failing to upload my .hoodie folder screenshot. 
Will try again to upload in a bit.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to