srsteinmetz commented on issue #1737: URL: https://github.com/apache/hudi/issues/1737#issuecomment-653619184
When I was originally load testing this table I was sending almost exclusively inserts. According to this documentation it seems expected that inserts end up in new parquet files: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=135860485. When I changed my load generator to start sending updates I noticed that the parquet files were compressed as expected. Now with only updates being sent the .hoodie folder seems to show cleaning happening as expected. However, for our use-case some of our tables will be almost exclusively inserts, so I'm worried the current behavior will result in many parquet files and degrading performance. From reading this thread it seems like this behavior might be related to https://hudi.apache.org/docs/configurations.html#logFileToParquetCompressionRatio but from the description it's still not clear to me how this property should be configured to get the desired behavior. For some reason GitHub is failing to upload my .hoodie folder screenshot. Will try again to upload in a bit. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
