hudi-bot opened a new issue, #16034:
URL: https://github.com/apache/hudi/issues/16034

   Create a new clustering strategy to execute parquet tools commands during 
clustering.
   
   If there is a use case of pruning some columns to save storage memory, 
current approach of clustering will iterate over every record and remove the 
unused column, this is so much time consuming. By directly using ParquetTools 
we can achieve this by running a command within the clustering strategy.
   
   Here, the logic goes through the process of creating marker files that on 
event of failures we could use the existing rollback mechanism to remove the 
inflight files and parquet files.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-6404
   - Type: New Feature


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to