prashantwason commented on a change in pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#discussion_r569801020



##########
File path: 
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java
##########
@@ -188,41 +196,51 @@ private synchronized void openFileSliceIfNeeded() throws 
IOException {
 
     // Load the schema
     Schema schema = 
HoodieAvroUtils.addMetadataFields(HoodieMetadataRecord.getClassSchema());
-    logRecordScanner = new 
HoodieMetadataMergedLogRecordScanner(metaClient.getFs(), metadataBasePath,
-            logFilePaths, schema, latestMetaInstantTimestamp, 
MAX_MEMORY_SIZE_IN_BYTES, BUFFER_SIZE,
+    HoodieMetadataMergedLogRecordScanner logRecordScanner = new 
HoodieMetadataMergedLogRecordScanner(metaClient.getFs(),
+            metadataBasePath, logFilePaths, schema, 
latestMetaInstantTimestamp, MAX_MEMORY_SIZE_IN_BYTES, BUFFER_SIZE,
             spillableMapDirectory, null);
 
     LOG.info("Opened metadata log files from " + logFilePaths + " at instant " 
+ latestInstantTime
         + "(dataset instant=" + latestInstantTime + ", metadata instant=" + 
latestMetaInstantTimestamp + ")");
 
     metrics.ifPresent(metrics -> 
metrics.updateMetrics(HoodieMetadataMetrics.SCAN_STR, timer.endTimer()));
+
+    if (metadataConfig.enableReuse()) {
+      // cache for later reuse
+      cachedBaseFileReader = baseFileReader;
+      cachedLogRecordScanner = logRecordScanner;
+    }
+
+    return Pair.of(baseFileReader, logRecordScanner);
   }
 
-  private void closeIfNeeded() {
+  private void closeIfNeeded(Pair<HoodieFileReader, 
HoodieMetadataMergedLogRecordScanner> readers) {
     try {
       if (!metadataConfig.enableReuse()) {
-        close();
+        readers.getKey().close();

Review comment:
       The reason is that we do not lock the entire function 
getRecordByKeyFromMetadata(). So if we have one set of the reader variables, 
one of the thread may be reading a key while the other thread may be calling a 
close() on the readers.
   
   The getRecordByKeyFromMetadata() function does the following:
   1. Get the correct readers (open new readers if reuse=false)
   2. Read the key from the baseFileReader (reads key from HFile)
   3. Bytes to HoodieRecordPayload conversion (bytes read in above step from 
HFile)
   4. Read the key from the logRecordScanner (in-memory lookup)
   5. Merge the two payloads to get the final value  
   6. Close the readers (if reuse=false)
   
   We should only lock during Step 2 as HFile KeyScanner is not thread-safe. 
Rest of the steps can be done by multiple threads in parallel for max 
performance. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to