[ 
https://issues.apache.org/jira/browse/HDFS-14524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852254#comment-16852254
 ] 

Ahmed Hussein commented on HDFS-14524:
--------------------------------------

*Analysis:*

The design of Buckets introduce race conditions and does not seem to achieve 
the purpose of optimization.
 # The problem is due to the fact that the bucket index is calculated using 
{{(CurrentTime % WindowLens)}}. In that case, all most recent updates will fall 
into the same bucket causing contention on every single update.
 # For long window length: Each time a report is generated (once a minute), the 
whole calculation is redone. This is not efficient.

*Consequences:*

Writes and Reads at a given point of time are using the same bucket. Given that 
{{opType='\*'}} is updated each time an operation count is incremented, 
{{opType='*'}} may reset the counters of a bucket before they are consumed by 
the report generator. The bigger the set of the users and operations, the 
larger gap between the total and the sum of the operation types.

*Suggested solution:*
 * Keep an extra bucket to work as buffer. Incoming writes are always leading 
the index of the buckets. New increments only reset the bucket outside the 
{{windowLength}}.
 * Introduce {{LongAdder}} to reduce the overhead of contention between the 
threads updating the same bucket simultaneously.
 * To maintain backward compatibility, there will be a configuration to switch 
between the two different algorithms.

> NNTop total counts does not add up as expected
> ----------------------------------------------
>
>                 Key: HDFS-14524
>                 URL: https://issues.apache.org/jira/browse/HDFS-14524
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Minor
>
> {{opType='*'}} is sometimes smaller than the sum of the individual operation 
> types.
> {code:java}
> {
>   "windows": [
>     {
>       "windowLenMs": 300000,
>       "ops": [
>         {
>           "totalCount": 24158,
>           "opType": "rpc.complete",
>           "topUsers": [{ "count": 2944, "user": "user1" }]
>         },
>         {
>           "totalCount": 15921,
>           "opType": "rpc.rename",
>           "topUsers": [{ "count": 2891, "user": "user1" }]
>         },
>         {
>           "totalCount": 3015834,
>           "opType": "*",
>           "topUsers": [{ "count": 66652, "user": "user1" }]
>         },
>         {
>           "totalCount": 2086,
>           "opType": "rpc.abandonBlock",
>           "topUsers": [{ "count": 603, "user": "user1" }]
>         },
>         {
>           "totalCount": 30258,
>           "opType": "rpc.addBlock",
>           "topUsers": [{ "count": 3182, "user": "user1" }]
>         },
>         {
>           "totalCount": 101440,
>           "opType": "rpc.getServerDefaults",
>           "topUsers": [{ "count": 3521, "user": "user1" }]
>         },
>         {
>           "totalCount": 25258,
>           "opType": "rpc.create",
>           "topUsers": [{ "count": 1864, "user": "user1" }]
>         },
>         {
>           "totalCount": 1377563,
>           "opType": "rpc.getFileInfo",
>           "topUsers": [{ "count": 56541, "user": "user1" }]
>         },
>         {
>           "totalCount": 60836,
>           "opType": "rpc.renewLease",
>           "topUsers": [{ "count": 3783, "user": "user1" }]
>         },
>         {
>           "totalCount": 182212,
>           "opType": "rpc.getListing",
>           "topUsers": [{ "count": 1848, "user": "user1" }]
>         },
>         {
>           "totalCount": 380,
>           "opType": "rpc.updateBlockForPipeline",
>           "topUsers": [{ "count": 58, "user": "user1" }]
>         },
>         {
>           "totalCount": 215,
>           "opType": "rpc.updatePipeline",
>           "topUsers": [{ "count": 18, "user": "user1" }]
>         }
>       ]
>     }
>   ],
>   "timestamp": "2019-01-12"
> }
> {code}
>  
>  {{opType='*'}} from user {{user1}} is {{66652}}, but the sum of counts for 
> other {{optype}} values by {{user1}} is actually larger: {{77253}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to