srsteinmetz commented on issue #1737:
URL: https://github.com/apache/hudi/issues/1737#issuecomment-653235981


   Is there any update on this issue?
   
   I am having the same problem where each commit is creating a new parquet 
file and none of the old parquet files (which are not being touched again) are 
being cleaned. I am also using a MOR table and have tried setting 
"hoodie.cleaner.commits.retained" to 1. The behavior I am expecting is that 
every commit would clean the parquet files created by previous commits. 
However, the number of parquet files in each partition continuously grows in an 
unbounded manner, negatively affecting ingest performance. From my 
understanding it seems that cleaning is never being performed on my partitions. 
   
   I've attached a screenshot from one of the partitions, along with the code I 
am using to write the data.
   The writes happen through Spark Streaming on a 5 min batch interval.
   
   ![Too Much 
Parquet](https://user-images.githubusercontent.com/3799859/86412362-be687880-bc73-11ea-81c3-3b13b81dee9e.JPG)
   ![Write 
config](https://user-images.githubusercontent.com/3799859/86412371-c2949600-bc73-11ea-9117-b27e765871ef.JPG)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to