Davis Zhang created HUDI-9614:
---------------------------------
Summary: should only use metaclient to read index def file from
spark driver
Key: HUDI-9614
URL: https://issues.apache.org/jira/browse/HUDI-9614
Project: Apache Hudi
Issue Type: Bug
Components: index
Reporter: Davis Zhang
{code:java}
2. Called on Spark executors (CRITICAL):
- HoodieWriteHandle.java:132 - Inside initSecondaryIndexStats()
secondaryIndexDefns = hoodieTable.getMetaClient().getIndexMetadata()
.map(indexMetadata -> indexMetadata.getIndexDefinitions().values())
.orElse(Collections.emptyList())
This is called from the HoodieWriteHandle constructor, which is invoked on
executors through:
- BaseSparkCommitActionExecutor.java:288-294 - Distributed operation
return
HoodieJavaRDD.of(partitionedRDD.map(Tuple2::_2).mapPartitionsWithIndex((partition,
recordItr) -> {
if (WriteOperationType.isChangingRecords(operationType)) {
return handleUpsertPartition(instantTime, partition, recordItr,
bucketInfoGetter);
} else {
return handleInsertPartition(instantTime, partition, recordItr,
bucketInfoGetter);
}
}, true).flatMap(List::iterator));
The execution chain:
1. mapPartitionsWithIndex runs on executors
2. Calls handleUpsertPartition/handleInsertPartition
3. Creates HoodieWriteHandle instances via factories
4. HoodieWriteHandle constructor calls initSecondaryIndexStats()
5. initSecondaryIndexStats() calls getIndexMetadata()
This happens when secondary indexing is enabled and it's not a
clustering/compaction operation.
{code}
metaclient get index def func should not be called on executors.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)