[GitHub] [hudi] prashantwason commented on a change in pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

GitBox Wed, 19 Jan 2022 20:43:37 -0800


prashantwason commented on a change in pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#discussion_r788345996




##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java
##########
@@ -83,6 +84,12 @@
       .withDocumentation("Lower values increase the size of metadata tracked 
within HFile, but can offer potentially "
           + "faster lookup times.");
 
+  public static final ConfigProperty<String> HFILE_SCHEMA_KEY_FIELD_NAME = 
ConfigProperty

Review comment:
       This setting is broken because the HFileReader does not have a way to 
use it. Assume I specify this setting to be "someotherkey". The HFileReader 
will still use the hardcoded "key".
   
   I suggest you remove this setting and all associated code and defer this for 
a later PR which will plug in this setting to the reader.
   
   

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileWriter.java
##########
@@ -122,7 +128,13 @@ public boolean canWrite() {
 
   @Override
   public void writeAvro(String recordKey, IndexedRecord object) throws 
IOException {
-    byte[] value = HoodieAvroUtils.avroToBytes((GenericRecord)object);
+    byte[] value = HoodieAvroUtils.avroToBytes((GenericRecord) object);
+    if (schemaRecordKeyField.isPresent()) {
+      GenericRecord recordKeyExcludedRecord = 
HoodieAvroUtils.bytesToAvro(value, this.schema);

Review comment:
       This will reduce performance as you are converting the record to bytes 
in the line above and then immediately parsing it back to the GenericRecord 
again. 
   
   If may be better to check first before creating the bytes.

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileConfig.java
##########
@@ -43,9 +43,10 @@
   private final Configuration hadoopConf;
   private final BloomFilter bloomFilter;
   private final KeyValue.KVComparator hfileComparator;
+  private final String schemaKeyFieldId;

Review comment:
       Why is this an Id and not name? schemaKeyFieldName




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] prashantwason commented on a change in pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

Reply via email to