nsivabalan commented on a change in pull request #3590:
URL: https://github.com/apache/hudi/pull/3590#discussion_r716251124
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -401,64 +394,83 @@ private boolean
bootstrapFromFilesystem(HoodieEngineContext engineContext, Hoodi
}
/**
- * Sync the Metadata Table from the instants created on the dataset.
+ * Initialize file groups for a partition. For file listing, we just have
one file group.
*
- * @param datasetMetaClient {@code HoodieTableMetaClient} for the dataset
+ * All FileGroups for a given metadata partition has a fixed prefix as per
the {@link MetadataPartitionType#getFileIdPrefix()}.
+ * Each file group is suffixed with increments of 1 starting with 1.
+ *
+ * For instance, for FILES, there is only one file group named as "files-1"
+ * Lets say we configure 10 file groups for record level index, and prefix
as "record-index-bucket-"
+ * Filegroups will be named as :
+ * record-index-bucket-01
+ * record-index-bucket-02
+ * ...
+ * record-index-bucket-10
*/
- private void syncFromInstants(HoodieTableMetaClient datasetMetaClient) {
- ValidationUtils.checkState(enabled, "Metadata table cannot be synced as it
is not enabled");
- // (re) init the metadata for reading.
- initTableMetadata();
- try {
- List<HoodieInstant> instantsToSync =
metadata.findInstantsToSyncForWriter();
- if (instantsToSync.isEmpty()) {
- return;
- }
-
- LOG.info("Syncing " + instantsToSync.size() + " instants to metadata
table: " + instantsToSync);
-
- // Read each instant in order and sync it to metadata table
- for (HoodieInstant instant : instantsToSync) {
- LOG.info("Syncing instant " + instant + " to metadata table");
-
- Option<List<HoodieRecord>> records =
HoodieTableMetadataUtil.convertInstantToMetaRecords(datasetMetaClient,
- metaClient.getActiveTimeline(), instant, metadata.getUpdateTime());
- if (records.isPresent()) {
- commit(records.get(), MetadataPartitionType.FILES.partitionPath(),
instant.getTimestamp());
- }
+ private void initializeFileGroups(HoodieTableMetaClient datasetMetaClient,
MetadataPartitionType metadataPartition, String instantTime,
+ int fileGroupCount) throws IOException {
+
+ final HashMap<HeaderMetadataType, String> blockHeader = new HashMap<>();
+ blockHeader.put(HeaderMetadataType.INSTANT_TIME, instantTime);
+ // Archival of data table has a dependency on compaction(base files) in
metadata table.
+ // It is assumed that as of time Tx of base instant (/compaction time) in
metadata table,
+ // all commits in data table is in sync with metadata table. So, we always
create start with log file for any fileGroup.
+ final HoodieDeleteBlock block = new HoodieDeleteBlock(new HoodieKey[0],
blockHeader);
Review comment:
I feel it may not be easy to relax this. we can discuss this async as we
close out this patch.
here are the two dependencies, of which 1 could be relaxed.
1. during rollback, we check if commit that is being rollback is already
synced or not. if it is < last compacted time, we assume it's already synced.
We can get away with this if need be. We can always assume, if the commit being
rollbacked is not part of active timeline of metadata, its not synced and go
ahead with it. Only difference we might have here is, there could be some
additional files which are added to delete list which are not original synced
to metadata table at all.
2. archival of dataset if dependent on compaction in metadata table. This
might need more thoughts.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]