nsivabalan commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1252401000
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -116,11 +134,10 @@ public static HoodieWriteConfig createMetadataWriteConfig(
// Below config is only used if isLogCompactionEnabled is set.
.withLogCompactionBlocksThreshold(writeConfig.getMetadataLogCompactBlocksThreshold())
.build())
- .withParallelism(parallelism, parallelism)
- .withDeleteParallelism(parallelism)
- .withRollbackParallelism(parallelism)
- .withFinalizeWriteParallelism(parallelism)
- .withAllowMultiWriteOnSameInstant(true)
+
.withStorageConfig(HoodieStorageConfig.newBuilder().hfileMaxFileSize(maxHFileSizeBytes)
+
.logFileMaxSize(maxLogFileSizeBytes).logFileDataBlockMaxSize(maxLogBlockSizeBytes).build())
+ .withRollbackParallelism(defaultParallelism)
+ .withFinalizeWriteParallelism(defaultParallelism)
Review Comment:
did you remove .withAllowMultiWriteOnSameInstant(true) intentionally ?
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -91,8 +108,9 @@ public static HoodieWriteConfig createMetadataWriteConfig(
.withCleanConfig(HoodieCleanConfig.newBuilder()
.withAsyncClean(DEFAULT_METADATA_ASYNC_CLEAN)
.withAutoClean(false)
- .withCleanerParallelism(parallelism)
- .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS)
+ .withCleanerParallelism(defaultParallelism)
+ .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS)
+ .retainFileVersions(2)
Review Comment:
I understand it could be a larger change, but file versions makes sense in
general. If uber has been running w/ file versions for 6+ months, we should do
a round of testing on our end, and can possibly proceed.
but incremental cleaning may not kick in. so, for large MDTs, wondering
will there be any latency hit
##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -341,7 +341,11 @@ private void ensurePartitionsLoadedCorrectly(List<String>
partitionList) {
long beginTs = System.currentTimeMillis();
// Not loaded yet
try {
- LOG.info("Building file system view for partitions " + partitionSet);
+ if (partitionSet.size() < 100) {
+ LOG.info("Building file system view for partitions: " +
partitionSet);
Review Comment:
yes, may be we should reconsider the freq of logging here. for eg, log every
every 100 partitions or something. not sure we will gain much by logging this
for every partition.
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -537,7 +538,8 @@ public HoodieTableMetaClient getMetadataMetaClient() {
}
public Map<String, String> stats() {
- return metrics.map(m -> m.getStats(true, metadataMetaClient,
this)).orElse(new HashMap<>());
+ Set<String> allMetadataPartitionPaths =
Arrays.stream(MetadataPartitionType.values()).map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+ return metrics.map(m -> m.getStats(true, metadataMetaClient, this,
allMetadataPartitionPaths)).orElse(new HashMap<>());
Review Comment:
HoodieMetadataMetrics.getStats(boolean detailed, HoodieTableMetaClient
metaClient, HoodieTableMetadata metadata)
reloads the timeline.
can we move the reload to outside of the caller so that we don't reload for
every MDT partition stats
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -176,7 +176,7 @@ private void initMetadataReader() {
}
try {
- this.metadata = new HoodieBackedTableMetadata(engineContext,
dataWriteConfig.getMetadataConfig(), dataWriteConfig.getBasePath());
+ this.metadata = new HoodieBackedTableMetadata(engineContext,
dataWriteConfig.getMetadataConfig(), dataWriteConfig.getBasePath(), true);
Review Comment:
rational is that, metadata writer itself is short lived just for committing
one instant and so we should be good to enable re-use here?
do we even expect to see any improvement here, since this is meant just for
one write to MDT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]