[GitHub] [hudi] nsivabalan commented on a change in pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

GitBox Wed, 27 Jan 2021 07:26:25 -0800


nsivabalan commented on a change in pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#discussion_r565386805




##########
File path: 
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -223,6 +237,10 @@ public Option getRecordByKey(String key, Schema 
readerSchema) throws IOException
 
   private R getRecordFromCell(Cell c, Schema writerSchema, Schema 
readerSchema) throws IOException {
     byte[] value = Arrays.copyOfRange(c.getValueArray(), c.getValueOffset(), 
c.getValueOffset() + c.getValueLength());
+    return getRecordFromByteValue(value, writerSchema, readerSchema);
+  }
+
+  private R getRecordFromByteValue(byte[] value, Schema writerSchema, Schema 
readerSchema) throws IOException {

Review comment:
       do you think one line code warrants a separate method? not sure if we 
really need this. just curious. 

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -209,12 +212,23 @@ public R next() {
 
   @Override
   public Option getRecordByKey(String key, Schema readerSchema) throws 
IOException {
-    HFileScanner scanner = reader.getScanner(false, true);
+    byte[] value = null;
     KeyValue kv = new KeyValue(key.getBytes(), null, null, null);
-    if (scanner.seekTo(kv) == 0) {
-      Cell c = scanner.getKeyValue();
-      byte[] keyBytes = Arrays.copyOfRange(c.getRowArray(), c.getRowOffset(), 
c.getRowOffset() + c.getRowLength());
-      R record = getRecordFromCell(c, getSchema(), readerSchema);
+
+    synchronized (this) {

Review comment:
       can you help me understand how getRecordByKey() will be called 
concurrently? in other words, can a single reader be called by multiple callers 
concurrently. from my understanding, a single reader won't trigger concurrent 
getRecordByKey() calls. 

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -209,12 +212,23 @@ public R next() {
 
   @Override
   public Option getRecordByKey(String key, Schema readerSchema) throws 
IOException {
-    HFileScanner scanner = reader.getScanner(false, true);
+    byte[] value = null;
     KeyValue kv = new KeyValue(key.getBytes(), null, null, null);
-    if (scanner.seekTo(kv) == 0) {
-      Cell c = scanner.getKeyValue();
-      byte[] keyBytes = Arrays.copyOfRange(c.getRowArray(), c.getRowOffset(), 
c.getRowOffset() + c.getRowLength());
-      R record = getRecordFromCell(c, getSchema(), readerSchema);
+
+    synchronized (this) {
+      if (keyScanner == null) {
+        keyScanner = reader.getScanner(true, true);

Review comment:
       is it an overhead to initialize a scanner. if not, why not we initialize 
in constructor only? 

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -236,6 +254,9 @@ public void close() {
     try {
       reader.close();
       reader = null;
+      if (keyScanner != null) {

Review comment:
       again, not sure of code paths. so, looks like close() does not require 
handling thread safety is it. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on a change in pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

Reply via email to