[ 
https://issues.apache.org/jira/browse/HUDI-6404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6404:
---------------------------------
    Labels: pull-request-available  (was: )

> Create a new clustering strategy to execute parquet tools commands during 
> clustering
> ------------------------------------------------------------------------------------
>
>                 Key: HUDI-6404
>                 URL: https://issues.apache.org/jira/browse/HUDI-6404
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: Surya Prasanna Yalla
>            Priority: Major
>              Labels: pull-request-available
>
> Create a new clustering strategy to execute parquet tools commands during 
> clustering.
> If there is a use case of pruning some columns to save storage memory, 
> current approach of clustering will iterate over every record and remove the 
> unused column, this is so much time consuming. By directly using ParquetTools 
> we can achieve this by running a command within the clustering strategy.
> Here, the logic goes through the process of creating marker files that on 
> event of failures we could use the existing rollback mechanism to remove the 
> inflight files and parquet files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to