prashantwason commented on a change in pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#discussion_r788345996
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java
##########
@@ -83,6 +84,12 @@
.withDocumentation("Lower values increase the size of metadata tracked
within HFile, but can offer potentially "
+ "faster lookup times.");
+ public static final ConfigProperty<String> HFILE_SCHEMA_KEY_FIELD_NAME =
ConfigProperty
Review comment:
This setting is broken because the HFileReader does not have a way to
use it. Assume I specify this setting to be "someotherkey". The HFileReader
will still use the hardcoded "key".
I suggest you remove this setting and all associated code and defer this for
a later PR which will plug in this setting to the reader.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileWriter.java
##########
@@ -122,7 +128,13 @@ public boolean canWrite() {
@Override
public void writeAvro(String recordKey, IndexedRecord object) throws
IOException {
- byte[] value = HoodieAvroUtils.avroToBytes((GenericRecord)object);
+ byte[] value = HoodieAvroUtils.avroToBytes((GenericRecord) object);
+ if (schemaRecordKeyField.isPresent()) {
+ GenericRecord recordKeyExcludedRecord =
HoodieAvroUtils.bytesToAvro(value, this.schema);
Review comment:
This will reduce performance as you are converting the record to bytes
in the line above and then immediately parsing it back to the GenericRecord
again.
If may be better to check first before creating the bytes.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileConfig.java
##########
@@ -43,9 +43,10 @@
private final Configuration hadoopConf;
private final BloomFilter bloomFilter;
private final KeyValue.KVComparator hfileComparator;
+ private final String schemaKeyFieldId;
Review comment:
Why is this an Id and not name? schemaKeyFieldName
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]