hussein-awala commented on PR #7041:
URL: https://github.com/apache/hudi/pull/7041#issuecomment-1301440409

   Perfect! I was going to suggest that because this is useless when 
incremental cleaning mode is not activated. I will add a new config for the 
empty clean commits, and I will duplicate the tests I already fixed (the old 
version without empty clean commit and the new one with empty clean commit 
enabled).
   
   > for those who are running clean after every commit, it could keep 
producing empty clean commit files in the timeline which could impact the query 
latency for large scale datasets.
   
   If they are running clean after every commit with incremental cleaning, it's 
better to add an empty clean commit to check only the changed partitions since 
the last commit instead of checked all the changed partitions since the last no 
empty clean. Based on our tests, I confirm that this improve the query latency 
and not the opposite. But I will create a new separate config for this patch 
and not activating it when incremental cleaning is enabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to