[jira] [Commented] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric

ASF GitHub Bot (Jira) Sun, 13 Nov 2022 20:49:06 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-18501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633498#comment-17633498
 ]


ASF GitHub Bot commented on HADOOP-18501:
-----------------------------------------

pranavsaxena-microsoft commented on code in PR #5109:
URL: https://github.com/apache/hadoop/pull/5109#discussion_r1021070832


##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java:
##########
@@ -507,14 +507,22 @@ private int readInternal(final long position, final 
byte[] b, final int offset,
       }
 
       // got nothing from read-ahead, do our own read now
-      receivedBytes = readRemote(position, b, offset, length, new 
TracingContext(tracingContext));
-      return receivedBytes;
+      return explicitReadRemoteCall(position, b, offset, length);
     } else {
       LOG.debug("read ahead disabled, reading remote");
-      return readRemote(position, b, offset, length, new 
TracingContext(tracingContext));
+      return explicitReadRemoteCall(position, b, offset, length);
     }
   }
 
+  private int explicitReadRemoteCall(final long position,
+      final byte[] b,
+      final int offset,
+      final int length) throws IOException {
+    final Long requiredLen = Math.min(length, contentLength - position);

Review Comment:
   The same is done in 
https://github.com/pranavsaxena-microsoft/hadoop/blob/partialReadThrottle2/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java#L482.
   
   Also, the AbfsInputStream takes next decision on how to loop for the data on 
the basis of data it has recieved on one run. For example, if it had to read 4 
MB of data, but server return 1MB data, it would call for the next 3 MB of data.
   How?
   * Increment fcursor(global variable in abfsInputStream to define on where 
cursor to the file is in) by the bytesRead(here 1MB): 
https://github.com/pranavsaxena-microsoft/hadoop/blob/partialReadThrottle2/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java#L334
   *  Checks if it further needs to loop: 
https://github.com/pranavsaxena-microsoft/hadoop/blob/partialReadThrottle2/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java#L270-L273





> [ABFS]: Partial Read should add to throttling metric
> ----------------------------------------------------
>
>                 Key: HADOOP-18501
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18501
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>    Affects Versions: 3.3.4
>            Reporter: Pranav Saxena
>            Assignee: Pranav Saxena
>            Priority: Minor
>              Labels: pull-request-available
>
> Error Description:
> For partial read (due to account backend throttling), the ABFS driver retry 
> but doesn't add up in the throttling metrics.
> In case of partial read with connection-reset exception, ABFS driver retry 
> for the full request and doesn't add up in throttling metrics.
> Mitigation:
> In case of partial read, Abfs Driver should retry for the remaining bytes and 
> it should be added in throttling metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric

Reply via email to