prashantwason commented on a change in pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#discussion_r778656822
##########
File path:
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -507,6 +519,255 @@ public void
testMetadataTableWithPendingCompaction(boolean simulateFailedCompact
}
}
+ /**
+ * Test arguments - Table type, populate meta fields, exclude key from
payload.
+ */
+ public static List<Arguments> testMetadataRecordKeyExcludeFromPayloadArgs() {
+ return asList(
+ Arguments.of(COPY_ON_WRITE, true),
+ Arguments.of(COPY_ON_WRITE, false),
+ Arguments.of(MERGE_ON_READ, true),
+ Arguments.of(MERGE_ON_READ, false)
+ );
+ }
+
+ /**
Review comment:
These tests are not specific to metadata table.
Populate meta fields should be tested via virtual key support.
key field should be tested via HFileWriter / HFileBlock tests.
Metadata Tests are time consuming so may be can reduce these.
##########
File path:
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -151,15 +154,15 @@ public BloomFilter readBloomFilter() {
}
public List<Pair<String, R>> readAllRecords(Schema writerSchema, Schema
readerSchema) throws IOException {
+ final Option<Schema.Field> keySchemaField =
Option.ofNullable(readerSchema.getField(keyField));
List<Pair<String, R>> recordList = new LinkedList<>();
try {
final HFileScanner scanner = reader.getScanner(false, false);
if (scanner.seekTo()) {
do {
Cell c = scanner.getKeyValue();
- byte[] keyBytes = Arrays.copyOfRange(c.getRowArray(),
c.getRowOffset(), c.getRowOffset() + c.getRowLength());
- R record = getRecordFromCell(c, writerSchema, readerSchema);
- recordList.add(new Pair<>(new String(keyBytes), record));
+ final Pair<String, R> keyAndRecordPair = getRecordFromCell(c,
writerSchema, readerSchema, keySchemaField);
+ recordList.add(new Pair<>(keyAndRecordPair.getFirst(),
keyAndRecordPair.getSecond()));
Review comment:
Instead of allocation a new Pair, cant we simply add keyAndRecordPair to
recordList?
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileWriter.java
##########
@@ -77,6 +81,8 @@ public HoodieHFileWriter(String instantTime, Path file,
HoodieHFileConfig hfileC
this.file = HoodieWrapperFileSystem.convertToHoodiePath(file, conf);
this.fs = (HoodieWrapperFileSystem) this.file.getFileSystem(conf);
this.hfileConfig = hfileConfig;
+ this.schema = schema;
+ this.schemaRecordKeyField =
Option.ofNullable(schema.getField(HoodieMetadataPayload.SCHEMA_FIELD_ID_KEY));
Review comment:
It would be even simpler to migrate
HoodieMetadataPayload.SCHEMA_FIELD_ID_KEY to HoodieHFileWriter.
In other words, for now lets require HFile schemas to have a field called
"key". In future, if the need arises for HFile as a real data file format for
HUDI dataset outside of metadata table, we can customize this to allow
configuring the key field.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]