[
https://issues.apache.org/jira/browse/HUDI-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-8475:
--------------------------------------
Fix Version/s: 1.0.2
> Generate all required stats required for MDT within DT write handles
> --------------------------------------------------------------------
>
> Key: HUDI-8475
> URL: https://issues.apache.org/jira/browse/HUDI-8475
> Project: Apache Hudi
> Issue Type: Sub-task
> Components: metadata, writer-core
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Critical
> Fix For: 1.0.1, 1.0.2
>
>
> As of now, some stats are sent to HoodieBackedTableMetadataWriter from the
> write handles, while some are generated by reading the base files or log
> on-demand within HoodieBackedTableMetadataWriter.update() call.
> With the new dag design, we wanted to populate all stats within the DT write
> handles only and send them back via WriteStatus.
> We do not plan to collect entire WriteStatus in the driver, but just the
> HoodieWRiteStat, and so we should be able to manage with WriteStatus holding
> all the required stats for all indexes in MDT.
>
> FILES: no additional work required
> col stats: generate all stats within write handles including base data files.
> bloom index: Better to ignore it here. We can do on-demand read from base
> files from within the MDTPartitioner.
> functional index stats: same as col stats. For bloom, we can defer.
> RLI: no additional work required.
> secondary index: lets generate all required stats from within all 3 write
> handles. for Append handle, we might have to read the entire file slice
> including the current file being written and stats have to be generated.
> partition stats: yet to design this.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)