[
https://issues.apache.org/jira/browse/HUDI-6404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-6404:
---------------------------------
Labels: pull-request-available (was: )
> Create a new clustering strategy to execute parquet tools commands during
> clustering
> ------------------------------------------------------------------------------------
>
> Key: HUDI-6404
> URL: https://issues.apache.org/jira/browse/HUDI-6404
> Project: Apache Hudi
> Issue Type: New Feature
> Reporter: Surya Prasanna Yalla
> Priority: Major
> Labels: pull-request-available
>
> Create a new clustering strategy to execute parquet tools commands during
> clustering.
> If there is a use case of pruning some columns to save storage memory,
> current approach of clustering will iterate over every record and remove the
> unused column, this is so much time consuming. By directly using ParquetTools
> we can achieve this by running a command within the clustering strategy.
> Here, the logic goes through the process of creating marker files that on
> event of failures we could use the existing rollback mechanism to remove the
> inflight files and parquet files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)