Davis Zhang created HUDI-9614:
---------------------------------

             Summary: should only use metaclient to read index def file from 
spark driver
                 Key: HUDI-9614
                 URL: https://issues.apache.org/jira/browse/HUDI-9614
             Project: Apache Hudi
          Issue Type: Bug
          Components: index
            Reporter: Davis Zhang


{code:java}
2. Called on Spark executors (CRITICAL):
  - HoodieWriteHandle.java:132 - Inside initSecondaryIndexStats()
  secondaryIndexDefns = hoodieTable.getMetaClient().getIndexMetadata()
      .map(indexMetadata -> indexMetadata.getIndexDefinitions().values())
      .orElse(Collections.emptyList())
  This is called from the HoodieWriteHandle constructor, which is invoked on 
executors through:
  - BaseSparkCommitActionExecutor.java:288-294 - Distributed operation
  return 
HoodieJavaRDD.of(partitionedRDD.map(Tuple2::_2).mapPartitionsWithIndex((partition,
 recordItr) -> {
    if (WriteOperationType.isChangingRecords(operationType)) {
      return handleUpsertPartition(instantTime, partition, recordItr, 
bucketInfoGetter);
    } else {
      return handleInsertPartition(instantTime, partition, recordItr, 
bucketInfoGetter);
    }
  }, true).flatMap(List::iterator));
  The execution chain:
  1. mapPartitionsWithIndex runs on executors
  2. Calls handleUpsertPartition/handleInsertPartition
  3. Creates HoodieWriteHandle instances via factories
  4. HoodieWriteHandle constructor calls initSecondaryIndexStats()
  5. initSecondaryIndexStats() calls getIndexMetadata()
  This happens when secondary indexing is enabled and it's not a 
clustering/compaction operation.
 
{code}
metaclient get index def func should not be called on executors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to