hudi-bot opened a new issue, #16034: URL: https://github.com/apache/hudi/issues/16034
Create a new clustering strategy to execute parquet tools commands during clustering. If there is a use case of pruning some columns to save storage memory, current approach of clustering will iterate over every record and remove the unused column, this is so much time consuming. By directly using ParquetTools we can achieve this by running a command within the clustering strategy. Here, the logic goes through the process of creating marker files that on event of failures we could use the existing rollback mechanism to remove the inflight files and parquet files. ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-6404 - Type: New Feature -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
