[jira] [Updated] (HADOOP-18872) ABFS: Misreporting Retry Count for Sub-sequential and Parallel Operations

Anuj Modi (Jira) Sun, 03 Sep 2023 21:40:04 -0700


     [ 
https://issues.apache.org/jira/browse/HADOOP-18872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anuj Modi updated HADOOP-18872:
-------------------------------
    Description: 
There was a bug identified where retry count in the client correlation id was 
wrongly reported for sub-sequential and parallel operations triggered by a 
single file system call. This was due to reusing same tracing context for all 
such calls.
We create a new tracing context as soon as HDFS call comes. We keep on passing 
that same TC for all the client calls.

For instance, when we get a createFile call, we first call metadata operations. 
If those metadata operations somehow succeeded after a few retries, the tracing 
context will have that many retry count in it. Now when actual call for create 
is made, same retry count will be used to construct the 
headers(clientCorrelationId). Alhough the create operation never failed, we 
will still see retry count from the previous request.

Fix is to use a new tracing context object for all the network calls made. All 
the sub-sequential and parallel operations will have same primary request Id to 
correlate them, yet they will have their own tracing of retry count.

  was:Will create a new tracing context for each new ABFS rest operation, which 
will prevent the same retry count number getting reflected in parallel and 
sub-sequential operations.


> ABFS: Misreporting Retry Count for Sub-sequential and Parallel Operations
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-18872
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18872
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: build
>    Affects Versions: 3.3.6
>            Reporter: Anmol Asrani
>            Assignee: Anuj Modi
>            Priority: Major
>              Labels: Bug
>             Fix For: 3.3.6
>
>
> There was a bug identified where retry count in the client correlation id was 
> wrongly reported for sub-sequential and parallel operations triggered by a 
> single file system call. This was due to reusing same tracing context for all 
> such calls.
> We create a new tracing context as soon as HDFS call comes. We keep on 
> passing that same TC for all the client calls.
> For instance, when we get a createFile call, we first call metadata 
> operations. If those metadata operations somehow succeeded after a few 
> retries, the tracing context will have that many retry count in it. Now when 
> actual call for create is made, same retry count will be used to construct 
> the headers(clientCorrelationId). Alhough the create operation never failed, 
> we will still see retry count from the previous request.
> Fix is to use a new tracing context object for all the network calls made. 
> All the sub-sequential and parallel operations will have same primary request 
> Id to correlate them, yet they will have their own tracing of retry count.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-18872) ABFS: Misreporting Retry Count for Sub-sequential and Parallel Operations

Reply via email to