[GitHub] [hadoop] snvijaya commented on a change in pull request #2307: HADOOP-17250 Lot of short reads can be merged with readahead.

GitBox Wed, 16 Sep 2020 07:10:48 -0700


snvijaya commented on a change in pull request #2307:
URL: https://github.com/apache/hadoop/pull/2307#discussion_r489461371




##########
File path: 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java
##########
@@ -180,9 +205,13 @@ private int readOneBlock(final byte[] b, final int off, 
final int len) throws IO
 
       // Enable readAhead when reading sequentially
       if (-1 == fCursorAfterLastRead || fCursorAfterLastRead == fCursor || 
b.length >= bufferSize) {
+        LOG.debug("Sequential read with read ahead size of {}", bufferSize);
         bytesRead = readInternal(fCursor, buffer, 0, bufferSize, false);
       } else {
-        bytesRead = readInternal(fCursor, buffer, 0, b.length, true);
+        // Enabling read ahead for random reads as well to reduce number of 
remote calls.
+        int lengthWithReadAhead = Math.min(b.length + readAheadRange, 
bufferSize);
+        LOG.debug("Random read with read ahead size of {}", 
lengthWithReadAhead);
+        bytesRead = readInternal(fCursor, buffer, 0, lengthWithReadAhead, 
true);

Review comment:
       As with Parquet and ORC we have seen read patterns move from sequential 
to random and vice versa. That being the case would it not be better to read 
ahead to bufferSize always ? Providing options to read to lower bytes like 64 
KB can actually lead to more IOPs. From our meeting yesterday too , one thing 
we all agree to was lower the IOPs better and also better to read more than 
smaller size. 
   So let remove the config for readAheadRange and instead always readAhead for 
whats configured for bufferSize.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop] snvijaya commented on a change in pull request #2307: HADOOP-17250 Lot of short reads can be merged with readahead.

Reply via email to