[ 
https://issues.apache.org/jira/browse/HUDI-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-8474:
--------------------------------------
    Description: 
We need a repartitoner for MDT where we take in HoodieData<HoodieRecords> and 
return 1 spark task pertaining to 1 file slice in MDT. 
For eg, for FILES, its typically 1 file slice. 
for col stats, RLI, etc its based on how user has configured it.

We should be doing sort within partitioner as well since w/ hfile we might have 
to sort the keys.

 

Except partition stats index, every other index should be straight forward. For 
partition stats record generation, we have a tracking ticket 
https://issues.apache.org/jira/browse/HUDI-8476 

 

  was:
We need a repartitoner for MDT where we take in HoodieData<HoodieRecords> and 
return 1 spark task pertaining to 1 file slice in MDT. 
For eg, for FILES, its typically 1 file slice. 
for col stats, RLI, etc its based on how user has configured it. 

We should be doing sort within partitioner as well since w/ hfile we might have 
to sort the keys. 



> Design and Impl MDT repartitioner to assist with writing to MDT
> ---------------------------------------------------------------
>
>                 Key: HUDI-8474
>                 URL: https://issues.apache.org/jira/browse/HUDI-8474
>             Project: Apache Hudi
>          Issue Type: Sub-task
>          Components: metadata, writer-core
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Major
>             Fix For: 1.0.0
>
>
> We need a repartitoner for MDT where we take in HoodieData<HoodieRecords> and 
> return 1 spark task pertaining to 1 file slice in MDT. 
> For eg, for FILES, its typically 1 file slice. 
> for col stats, RLI, etc its based on how user has configured it.
> We should be doing sort within partitioner as well since w/ hfile we might 
> have to sort the keys.
>  
> Except partition stats index, every other index should be straight forward. 
> For partition stats record generation, we have a tracking ticket 
> https://issues.apache.org/jira/browse/HUDI-8476 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to