Surya Prasanna Yalla created HUDI-6404:
------------------------------------------
Summary: Create a new clustering strategy to execute parquet tools
commands during clustering
Key: HUDI-6404
URL: https://issues.apache.org/jira/browse/HUDI-6404
Project: Apache Hudi
Issue Type: New Feature
Reporter: Surya Prasanna Yalla
Create a new clustering strategy to execute parquet tools commands during
clustering.
If there is a use case of pruning some columns to save storage memory, current
approach of clustering will iterate over every record and remove the unused
column, this is so much time consuming. By directly using ParquetTools we can
achieve this by running a command within the clustering strategy.
Here, the logic goes through the process of creating marker files that on event
of failures we could use the existing rollback mechanism to remove the inflight
files and parquet files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)