hudi-bot opened a new issue, #15408:
URL: https://github.com/apache/hudi/issues/15408

   We have partition aware clustering strategy and recent partitions based 
strategy as well for clustering. This plays out well if partitioning is based 
on dates. but what incase partitioning is based on some other random field. 
   
    
   
   So, we might need another clustering filtering strategy to consider only 
those file groups which got touched in the last N commits. 
   
   For eg, if a user configures clustering to run every 5 commits, every time 
clustering runs, it will consider only the file groups touched in the last 5 
commits. This will avoid triggering repeated clustering for already clustered 
file groups as well and clustering will be very fast only delta file groups are 
considered. 
   
    
   
    
   
    
   
    
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-4773
   - Type: Improvement


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to