[ 
https://issues.apache.org/jira/browse/HUDI-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-8475:
--------------------------------------
    Fix Version/s: 1.0.2

> Generate all required stats required for MDT within DT write handles
> --------------------------------------------------------------------
>
>                 Key: HUDI-8475
>                 URL: https://issues.apache.org/jira/browse/HUDI-8475
>             Project: Apache Hudi
>          Issue Type: Sub-task
>          Components: metadata, writer-core
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Critical
>             Fix For: 1.0.1, 1.0.2
>
>
> As of now, some stats are sent to HoodieBackedTableMetadataWriter from the 
> write handles, while some are generated by reading the base files or log 
> on-demand within HoodieBackedTableMetadataWriter.update() call.
> With the new dag design, we wanted to populate all stats within the DT write 
> handles only and send them back via WriteStatus.
> We do not plan to collect entire WriteStatus in the driver, but just the 
> HoodieWRiteStat, and so we should be able to manage with WriteStatus holding 
> all the required stats for all indexes in MDT.
>  
> FILES: no additional work required 
> col stats: generate all stats within write handles including base data files. 
> bloom index: Better to ignore it here. We can do on-demand read from base 
> files from within the MDTPartitioner. 
> functional index stats: same as col stats. For bloom, we can defer. 
> RLI: no additional work required. 
> secondary index: lets generate all required stats from within all 3 write 
> handles. for Append handle, we might have to read the entire file slice 
> including the current file being written and stats have to be generated. 
> partition stats: yet to design this. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to