[jira] [Commented] (HUDI-3786) how to deduce what MDT partitions to update on the write path w/ async indeing

Sagar Sumit (Jira) Sun, 03 Apr 2022 19:27:07 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516606#comment-17516606
 ]


Sagar Sumit commented on HUDI-3786:
-----------------------------------

[~shivnarayan] The method you pointed gets called in either initial commit or 
the update path. For the initial commt, we do need to depend on write configs. 
For the update path, this method gets called to while converting to metadata 
records. During actual update (processAndCommit() method), we check from table 
config which partitions need to be udpated.

As long as we can ensure table config and write config are in sync (HUDI-3782) 
this should not be an issue. I agree that we can unify the logic so that one 
method gets incoked everywhere.

> how to deduce what MDT partitions to update on the write path w/ async indeing
> ------------------------------------------------------------------------------
>
>                 Key: HUDI-3786
>                 URL: https://issues.apache.org/jira/browse/HUDI-3786
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: metadata
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> w/ async indexing, how do we deduce what are the MDT partitions to update on 
> the regular write path? 
>  
> {code:java}
> private MetadataRecordsGenerationParams getRecordsGenerationParams() {
>   return new MetadataRecordsGenerationParams(
>       dataMetaClient, enabledPartitionTypes, 
> dataWriteConfig.getBloomFilterType(),
>       dataWriteConfig.getBloomIndexParallelism(),
>       dataWriteConfig.isMetadataColumnStatsIndexEnabled(),
>       dataWriteConfig.getColumnStatsIndexParallelism(),
>       
> StringUtils.toList(dataWriteConfig.getColumnsEnabledForColumnStatsIndex()),
>       
> StringUtils.toList(dataWriteConfig.getColumnsEnabledForBloomFilterIndex()));
> } {code}
> As of now, I see above code snippet is what deciding that. But don't we need 
> to decide on tableConfig ? 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HUDI-3786) how to deduce what MDT partitions to update on the write path w/ async indeing

Reply via email to