nsivabalan commented on a change in pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#discussion_r565386805
##########
File path:
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -223,6 +237,10 @@ public Option getRecordByKey(String key, Schema
readerSchema) throws IOException
private R getRecordFromCell(Cell c, Schema writerSchema, Schema
readerSchema) throws IOException {
byte[] value = Arrays.copyOfRange(c.getValueArray(), c.getValueOffset(),
c.getValueOffset() + c.getValueLength());
+ return getRecordFromByteValue(value, writerSchema, readerSchema);
+ }
+
+ private R getRecordFromByteValue(byte[] value, Schema writerSchema, Schema
readerSchema) throws IOException {
Review comment:
do you think one line code warrants a separate method? not sure if we
really need this. just curious.
##########
File path:
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -209,12 +212,23 @@ public R next() {
@Override
public Option getRecordByKey(String key, Schema readerSchema) throws
IOException {
- HFileScanner scanner = reader.getScanner(false, true);
+ byte[] value = null;
KeyValue kv = new KeyValue(key.getBytes(), null, null, null);
- if (scanner.seekTo(kv) == 0) {
- Cell c = scanner.getKeyValue();
- byte[] keyBytes = Arrays.copyOfRange(c.getRowArray(), c.getRowOffset(),
c.getRowOffset() + c.getRowLength());
- R record = getRecordFromCell(c, getSchema(), readerSchema);
+
+ synchronized (this) {
Review comment:
can you help me understand how getRecordByKey() will be called
concurrently? in other words, can a single reader be called by multiple callers
concurrently. from my understanding, a single reader won't trigger concurrent
getRecordByKey() calls.
##########
File path:
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -209,12 +212,23 @@ public R next() {
@Override
public Option getRecordByKey(String key, Schema readerSchema) throws
IOException {
- HFileScanner scanner = reader.getScanner(false, true);
+ byte[] value = null;
KeyValue kv = new KeyValue(key.getBytes(), null, null, null);
- if (scanner.seekTo(kv) == 0) {
- Cell c = scanner.getKeyValue();
- byte[] keyBytes = Arrays.copyOfRange(c.getRowArray(), c.getRowOffset(),
c.getRowOffset() + c.getRowLength());
- R record = getRecordFromCell(c, getSchema(), readerSchema);
+
+ synchronized (this) {
+ if (keyScanner == null) {
+ keyScanner = reader.getScanner(true, true);
Review comment:
is it an overhead to initialize a scanner. if not, why not we initialize
in constructor only?
##########
File path:
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -236,6 +254,9 @@ public void close() {
try {
reader.close();
reader = null;
+ if (keyScanner != null) {
Review comment:
again, not sure of code paths. so, looks like close() does not require
handling thread safety is it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]