nsivabalan commented on code in PR #13402:
URL: https://github.com/apache/hudi/pull/13402#discussion_r2136891309
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1096,6 +1119,126 @@ public void buildMetadataPartitions(HoodieEngineContext
engineContext, List<Hood
initializeFromFilesystem(instantTime, partitionTypes, Option.empty());
}
+ public void startCommit(String instantTime) {
+ ValidationUtils.checkState(streamingWritesEnabled, "Streaming writes
should be enabled for startCommit API");
+
+ if
(!metadataMetaClient.getActiveTimeline().getCommitsTimeline().containsInstant(instantTime))
{
+ // if this is a new commit being applied to metadata for the first time
+ LOG.info("New commit at {} being applied to MDT.", instantTime);
+ } else {
+ throw new HoodieMetadataException("Starting the same commit in Metadata
table more than once w/o rolling back : " + instantTime);
+ }
+
+ // this is where we might instantiate the write client to metadata table
for the first time.
+ getWriteClient().startCommitForMetadataTable(metadataMetaClient,
instantTime, HoodieTimeline.DELTA_COMMIT_ACTION);
+ }
+
+ @Override
+ public HoodieData<WriteStatus>
streamWriteToMetadataPartitions(HoodieData<WriteStatus> writeStatus, String
instantTime) {
+ List<MetadataPartitionType> mdtPartitionsToTag = new
ArrayList<>(enabledPartitionTypes);
+ mdtPartitionsToTag.remove(FILES);
+ mdtPartitionsToTag.retainAll(STREAMING_WRITES_SUPPORTED_PARTITIONS);
Review Comment:
Note to reviewer:
In this patch we are only adding RLI partition to streaming flow. All other
partitions will take up the 2nd write to metadata table i.e. non streaming
writes.
Overall, both writes to mdt put together will ensure all enabled partitions
are written to MDT.
In future, we will move 1 partition from non streaming write to streaming
write flow. This is designed this way, so that we don't need to drag the patch
until all partitions are added to streaming way of writing to mdt.
And also, from inconsistency standpoint, its only RLI and Secondary index
thats of very critical that definitely needs to be moved to streaming manner.
All others are solely reliant on HoodieCommitMetadata which does not really
matter much if we do streaming way or non streaming way.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]