[ 
https://issues.apache.org/jira/browse/HDFS-16040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351312#comment-17351312
 ] 

Simbarashe Dzinamarira commented on HDFS-16040:
-----------------------------------------------

In Server.java we have
{noformat}
public void run() {
  while (running) {
    ...
    boolean connDropped = true;
  
    try {
      call = callQueue.take(); // pop the queue; maybe blocked here
      ...
      if (alignmentContext != null && call.isCallCoordinated() &&
          call.getClientStateId() > alignmentContext.getLastSeenStateId()) {
        ...
        requeueCall(call);
        continue;
      }
      ...
    } catch (InterruptedException e) {
      ...
    } catch (Exception e) {
      ...
    } finally {
      ...
      if (call != null) {
        updateMetrics(call, startTimeNanos, connDropped);
        ...
      }
    }
  }
}

void updateMetrics(Call call, long startTime, boolean connDropped) {
  ...
  rpcMetrics.addRpcQueueTime(queueTime);

  if (call.isResponseDeferred() || connDropped) {
    // call was skipped; don't include it in processing metrics
    return;
  }
  ...
  rpcMetrics.addRpcProcessingTime(processingTime);
  ...
}{noformat}
*connDropped* is still true when updateMetrics is called due after the 
*continue* statement so processing time is not updated while RpcQueueTime is.

A simple fix is to set *call = null* after the re-queue. We'll need a unit test 
for this as well.

> RpcQueueTime metric counts requeued calls as unique events.
> -----------------------------------------------------------
>
>                 Key: HDFS-16040
>                 URL: https://issues.apache.org/jira/browse/HDFS-16040
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.10.0, 3.3.0
>            Reporter: Simbarashe Dzinamarira
>            Assignee: Simbarashe Dzinamarira
>            Priority: Major
>
> The RpcQueueTime metric is updated every time a call is re-queued while 
> waiting for the server state to reach the call's client's state ID. This is 
> in contrast to RpcProcessingTime which is only updated when the call if 
> finally processed.
> On the Observer NameNode this can result in RpcQueueTimeNumOps being much 
> larger than RpcProcessingTimeNumOps. The re-queueing is an internal 
> optimization to avoid blocking and shouldn't result in an inflated metric.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to