[GitHub] [hadoop] pranavsaxena-microsoft commented on a diff in pull request #5109: HADOOP-18501: ABFS: Partial read should add to throttling data

GitBox Sun, 13 Nov 2022 20:51:30 -0800


pranavsaxena-microsoft commented on code in PR #5109:
URL: https://github.com/apache/hadoop/pull/5109#discussion_r1021070832



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java:
##########
@@ -507,14 +507,22 @@ private int readInternal(final long position, final 
byte[] b, final int offset,
       }
 
       // got nothing from read-ahead, do our own read now
-      receivedBytes = readRemote(position, b, offset, length, new 
TracingContext(tracingContext));
-      return receivedBytes;
+      return explicitReadRemoteCall(position, b, offset, length);
     } else {
       LOG.debug("read ahead disabled, reading remote");
-      return readRemote(position, b, offset, length, new 
TracingContext(tracingContext));
+      return explicitReadRemoteCall(position, b, offset, length);
     }
   }
 
+  private int explicitReadRemoteCall(final long position,
+      final byte[] b,
+      final int offset,
+      final int length) throws IOException {
+    final Long requiredLen = Math.min(length, contentLength - position);

Review Comment:
   The same is done in 
https://github.com/pranavsaxena-microsoft/hadoop/blob/partialReadThrottle2/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java#L482.
   
   Also, the AbfsInputStream takes next decision on how to loop for the data on 
the basis of data it has recieved on one run. For example, if it had to read 4 
MB of data, but server return 1MB data, it would call for the next 3 MB of data.
   How?
   * Increment fcursor(global variable in abfsInputStream to define on where 
cursor to the file is in) by the bytesRead(here 1MB): 
https://github.com/pranavsaxena-microsoft/hadoop/blob/partialReadThrottle2/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java#L334
   *  Checks if it further needs to loop: 
https://github.com/pranavsaxena-microsoft/hadoop/blob/partialReadThrottle2/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java#L270-L273
   
   In case, we don't have this line of code of checking requiredLen, and the 
length is > than the contentLength, read API on backend will still return only 
the (contentLength - position) worth of data. Since, the response is going to 
be similar, its better to send only (contentLength - position) in the 
requestParams, as the request-params will be used to change the throttling 
metrics.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[GitHub] [hadoop] pranavsaxena-microsoft commented on a diff in pull request #5109: HADOOP-18501: ABFS: Partial read should add to throttling data

Reply via email to