pranavsaxena-microsoft commented on code in PR #5109: URL: https://github.com/apache/hadoop/pull/5109#discussion_r1021070832
########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java: ########## @@ -507,14 +507,22 @@ private int readInternal(final long position, final byte[] b, final int offset, } // got nothing from read-ahead, do our own read now - receivedBytes = readRemote(position, b, offset, length, new TracingContext(tracingContext)); - return receivedBytes; + return explicitReadRemoteCall(position, b, offset, length); } else { LOG.debug("read ahead disabled, reading remote"); - return readRemote(position, b, offset, length, new TracingContext(tracingContext)); + return explicitReadRemoteCall(position, b, offset, length); } } + private int explicitReadRemoteCall(final long position, + final byte[] b, + final int offset, + final int length) throws IOException { + final Long requiredLen = Math.min(length, contentLength - position); Review Comment: The same is done in https://github.com/pranavsaxena-microsoft/hadoop/blob/partialReadThrottle2/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java#L482. Also, the AbfsInputStream takes next decision on how to loop for the data on the basis of data it has recieved on one run. For example, if it had to read 4 MB of data, but server return 1MB data, it would call for the next 3 MB of data. How? * Increment fcursor(global variable in abfsInputStream to define on where cursor to the file is in) by the bytesRead(here 1MB): https://github.com/pranavsaxena-microsoft/hadoop/blob/partialReadThrottle2/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java#L334 * Checks if it further needs to loop: https://github.com/pranavsaxena-microsoft/hadoop/blob/partialReadThrottle2/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java#L270-L273 In case, we don't have this line of code of checking requiredLen, and the length is > than the contentLength, read API on backend will still return only the (contentLength - position) worth of data. Since, the response is going to be similar, its better to send only (contentLength - position) in the requestParams, as the request-params will be used to change the throttling metrics. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org