xushiyan commented on a change in pull request #4811:
URL: https://github.com/apache/hudi/pull/4811#discussion_r810700908
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
##########
@@ -243,7 +243,7 @@
public static final ConfigProperty<Boolean> PRESERVE_COMMIT_METADATA =
ConfigProperty
.key("hoodie.compaction.preserve.commit.metadata")
- .defaultValue(false)
+ .defaultValue(true)
.sinceVersion("0.11.0")
.withDocumentation("When rewriting data, preserves existing
hoodie_commit_time");
Review comment:
```suggestion
.withDocumentation("When running compaction, preserves existing
`_hoodie_commit_time`");
```
a side question: this won't affect clustering, will it?
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##########
@@ -294,6 +294,8 @@ protected boolean writeRecord(HoodieRecord<T> hoodieRecord,
Option<IndexedRecord
// Convert GenericRecord to GenericRecord with hoodie commit metadata
in schema
IndexedRecord recordWithMetadataInSchema =
rewriteRecord((GenericRecord) indexedRecord.get());
if (preserveMetadata) {
+ // do not preserve FILENAME_METADATA_FIELD
+
recordWithMetadataInSchema.put(HoodieRecord.HOODIE_META_COLUMNS_NAME_TO_POS.get(HoodieRecord.FILENAME_METADATA_FIELD),
newFilePath.getName());
Review comment:
another matter which may fix separately:
using `HOODIE_META_COLUMNS_NAME_TO_POS` to retrieve column index actually
affects performance given it's doing map look up for every record, also it's
pretty hard to read. Why not just use a group of int constants? like this
```
public static final int HOODIE_COMMIT_TIME_POS = 0;
```
i'd static import constants to avoid lengthy code most of the time.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##########
@@ -294,6 +294,8 @@ protected boolean writeRecord(HoodieRecord<T> hoodieRecord,
Option<IndexedRecord
// Convert GenericRecord to GenericRecord with hoodie commit metadata
in schema
IndexedRecord recordWithMetadataInSchema =
rewriteRecord((GenericRecord) indexedRecord.get());
if (preserveMetadata) {
+ // do not preserve FILENAME_METADATA_FIELD
Review comment:
is it always the case where we never preserve FILENAME ? since when
compact we always create new files, right? then it's some extra rule we have
to remember for this config, which sounds like we preserve all meta cols.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]