[
https://issues.apache.org/jira/browse/HUDI-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-8474:
--------------------------------------
Remaining Estimate: 8h
Original Estimate: 8h
> Design and Impl MDT repartitioner to assist with writing to MDT
> ---------------------------------------------------------------
>
> Key: HUDI-8474
> URL: https://issues.apache.org/jira/browse/HUDI-8474
> Project: Apache Hudi
> Issue Type: Sub-task
> Components: metadata, writer-core
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.0.2
>
> Original Estimate: 8h
> Remaining Estimate: 8h
>
> We need a repartitoner for MDT where we take in HoodieData<HoodieRecords> and
> return 1 spark task pertaining to 1 file slice in MDT.
> For eg, for FILES, its typically 1 file slice.
> for col stats, RLI, etc its based on how user has configured it.
> We should be doing sort within partitioner as well since w/ hfile we might
> have to sort the keys.
>
> Except partition stats index, every other index should be straight forward.
> For partition stats record generation, we have a tracking ticket
> https://issues.apache.org/jira/browse/HUDI-8476
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)