Surya Prasanna Yalla created HUDI-6404:
------------------------------------------

             Summary: Create a new clustering strategy to execute parquet tools 
commands during clustering
                 Key: HUDI-6404
                 URL: https://issues.apache.org/jira/browse/HUDI-6404
             Project: Apache Hudi
          Issue Type: New Feature
            Reporter: Surya Prasanna Yalla


Create a new clustering strategy to execute parquet tools commands during 
clustering.

If there is a use case of pruning some columns to save storage memory, current 
approach of clustering will iterate over every record and remove the unused 
column, this is so much time consuming. By directly using ParquetTools we can 
achieve this by running a command within the clustering strategy.

Here, the logic goes through the process of creating marker files that on event 
of failures we could use the existing rollback mechanism to remove the inflight 
files and parquet files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to