nsivabalan commented on code in PR #5274:
URL: https://github.com/apache/hudi/pull/5274#discussion_r846685721
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -770,18 +774,6 @@ private MetadataRecordsGenerationParams
getRecordsGenerationParams() {
}
}
- private Set<String> getMetadataPartitionsToUpdate() {
Review Comment:
I will try to explain to the best of my understanding. but will let @codope
chime in as well.
Case 1:
Existing MDT from 0.10.0, gets upgraded to 0.11 w/o enabling any new
partitions.
on first commit, after realizing FILES partition is already initialized, we
will update the table config w/ "FILES" for completed MDT partitions.
Case 2:
Existing MDT from 0.10.0, gets upgraded to 0.11 w/ all partitions enabled
(synchronous flow).
On first commit, we will realize 2 new columns (col stats and bloom filter)
are added and will initialize the new partitions. at the end of it, we will
update the table Config w/ all 3 partitions to completed MDT partitions.
Case3:
For a fresh table, use wishes to enable async indexing for col stats and
bloom filter. w/ regular writer, async indexing has to be enabled for these 2
partitions. So, with a diff process altogether, user is expected to schedule
and execute the index building. During scheduling, both partitions (col stats
and bloom filter) will be added to table config for the list of MDT partitions
being built. Once this is updated, with regular writer process, a data table
commit when getting applied to MDT, will update all 3 partitions in MDT (FILES
as part of completed MDT partitions and other 2 as part of MDT partitions being
built out). This is case where we are in need of
getMetadataPartitionsToUpdate() for writers to know what all partitions to
update.
I listed Case 1 and Case2 just for completeness. but case 3 is where we
might be in need of partitions being built out.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]