[
https://issues.apache.org/jira/browse/HADOOP-18501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633498#comment-17633498
]
ASF GitHub Bot commented on HADOOP-18501:
-----------------------------------------
pranavsaxena-microsoft commented on code in PR #5109:
URL: https://github.com/apache/hadoop/pull/5109#discussion_r1021070832
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java:
##########
@@ -507,14 +507,22 @@ private int readInternal(final long position, final
byte[] b, final int offset,
}
// got nothing from read-ahead, do our own read now
- receivedBytes = readRemote(position, b, offset, length, new
TracingContext(tracingContext));
- return receivedBytes;
+ return explicitReadRemoteCall(position, b, offset, length);
} else {
LOG.debug("read ahead disabled, reading remote");
- return readRemote(position, b, offset, length, new
TracingContext(tracingContext));
+ return explicitReadRemoteCall(position, b, offset, length);
}
}
+ private int explicitReadRemoteCall(final long position,
+ final byte[] b,
+ final int offset,
+ final int length) throws IOException {
+ final Long requiredLen = Math.min(length, contentLength - position);
Review Comment:
The same is done in
https://github.com/pranavsaxena-microsoft/hadoop/blob/partialReadThrottle2/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java#L482.
Also, the AbfsInputStream takes next decision on how to loop for the data on
the basis of data it has recieved on one run. For example, if it had to read 4
MB of data, but server return 1MB data, it would call for the next 3 MB of data.
How?
* Increment fcursor(global variable in abfsInputStream to define on where
cursor to the file is in) by the bytesRead(here 1MB):
https://github.com/pranavsaxena-microsoft/hadoop/blob/partialReadThrottle2/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java#L334
* Checks if it further needs to loop:
https://github.com/pranavsaxena-microsoft/hadoop/blob/partialReadThrottle2/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java#L270-L273
> [ABFS]: Partial Read should add to throttling metric
> ----------------------------------------------------
>
> Key: HADOOP-18501
> URL: https://issues.apache.org/jira/browse/HADOOP-18501
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/azure
> Affects Versions: 3.3.4
> Reporter: Pranav Saxena
> Assignee: Pranav Saxena
> Priority: Minor
> Labels: pull-request-available
>
> Error Description:
> For partial read (due to account backend throttling), the ABFS driver retry
> but doesn't add up in the throttling metrics.
> In case of partial read with connection-reset exception, ABFS driver retry
> for the full request and doesn't add up in throttling metrics.
> Mitigation:
> In case of partial read, Abfs Driver should retry for the remaining bytes and
> it should be added in throttling metrics.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]