nsivabalan commented on code in PR #12662:
URL: https://github.com/apache/hudi/pull/12662#discussion_r1927945408
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java:
##########
@@ -1064,12 +1069,21 @@ private boolean
shouldDeleteMetadataPartition(MetadataPartitionType partitionTyp
case RECORD_INDEX:
metadataIndexDisabled = !config.isRecordIndexEnabled();
break;
+ // PARTITION_STATS should have same behavior as COLUMN_STATS
+ case PARTITION_STATS:
+ metadataIndexDisabled = !config.isPartitionStatsIndexEnabled();
Review Comment:
:) I also had to fix this in my partition stats patch.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1097,9 +1097,9 @@ engineContext, dataWriteConfig, commitMetadata,
instantTime, dataMetaClient, get
// Updates for record index are created by parsing the WriteStatus which
is a hudi-client object. Hence, we cannot yet move this code
// to the HoodieTableMetadataUtil class in hudi-common.
- if (dataWriteConfig.isRecordIndexEnabled()) {
- HoodieData<HoodieRecord> additionalUpdates =
getRecordIndexAdditionalUpserts(partitionToRecordMap.get(MetadataPartitionType.RECORD_INDEX.getPartitionPath()),
commitMetadata);
- partitionToRecordMap.put(RECORD_INDEX.getPartitionPath(),
partitionToRecordMap.get(MetadataPartitionType.RECORD_INDEX.getPartitionPath()).union(additionalUpdates));
+ if (dataWriteConfig.isRecordIndexEnabled() &&
RECORD_INDEX.isMetadataPartitionAvailable(dataMetaClient)) {
Review Comment:
guess this was not the design.
the regular ingestion writer should log updates for partitions fully
available and for partitions that are inflight.
if not, the async indexer has to do full catch up. The catch up is meant
just for a race condition to ensure we do not miss any updates, but in most
cases, the catch up phase should not have any work to do.
lets discuss if our understanding is different
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]