[ 
https://issues.apache.org/jira/browse/HUDI-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-9614:
-----------------------------
    Status: Open  (was: In Progress)

> should only use metaclient to read index def file from spark driver
> -------------------------------------------------------------------
>
>                 Key: HUDI-9614
>                 URL: https://issues.apache.org/jira/browse/HUDI-9614
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: index
>            Reporter: Davis Zhang
>            Assignee: Vamshi Krishna Kyatham
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.1.0
>
>
> {code:java}
> 2. Called on Spark executors (CRITICAL):
>   - HoodieWriteHandle.java:132 - Inside initSecondaryIndexStats()
>   secondaryIndexDefns = hoodieTable.getMetaClient().getIndexMetadata()
>       .map(indexMetadata -> indexMetadata.getIndexDefinitions().values())
>       .orElse(Collections.emptyList())
>   This is called from the HoodieWriteHandle constructor, which is invoked on 
> executors through:
>   - BaseSparkCommitActionExecutor.java:288-294 - Distributed operation
>   return 
> HoodieJavaRDD.of(partitionedRDD.map(Tuple2::_2).mapPartitionsWithIndex((partition,
>  recordItr) -> {
>     if (WriteOperationType.isChangingRecords(operationType)) {
>       return handleUpsertPartition(instantTime, partition, recordItr, 
> bucketInfoGetter);
>     } else {
>       return handleInsertPartition(instantTime, partition, recordItr, 
> bucketInfoGetter);
>     }
>   }, true).flatMap(List::iterator));
>   The execution chain:
>   1. mapPartitionsWithIndex runs on executors
>   2. Calls handleUpsertPartition/handleInsertPartition
>   3. Creates HoodieWriteHandle instances via factories
>   4. HoodieWriteHandle constructor calls initSecondaryIndexStats()
>   5. initSecondaryIndexStats() calls getIndexMetadata()
>   This happens when secondary indexing is enabled and it's not a 
> clustering/compaction operation.
>  
> {code}
> metaclient get index def func should not be called on executors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to