[ 
https://issues.apache.org/jira/browse/HDFS-10872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583802#comment-15583802
 ] 

Zhe Zhang commented on HDFS-10872:
----------------------------------

Thanks Erik for the updated patch. I think we are pretty close.

About the main change in {{FSNamesystemLock}}
# If a thread lives shorter than the configured {{metricAggregationInterval}}, 
are we gonna lose its locking metrics? In reality it's probably not a big 
concern since RPC handler threads are reused. But putting a comment here for 
more thoughts.
# Selecting a _real_ default aggregation interval is not easy. Maybe we should 
document it in {{hdfs-default.xml}}. Alternatively we can have 2 config knobs, 
one binary and one integer.
# Agreed that the overhead with the current implementation is pretty small (say 
with 10 sec interval). As a follow-on optimizaiton (depending on experience 
from production deployment) maybe we can consider a combination of 
hard-deadline and opportunistic try-and-backoff. E.g. at a higher frequency 
than {{metricAggregationInterval}} we can try the lock for 
{{opHoldtimeMetrics}}; if lock is free, dump the metrics, otherwise try lock in 
some more time.

A few minors about naming:
# The below mapping merges the {{yield()}} locking time into the 
{{contentSummary}} category. It looks a reasonable approximation to me. But 
more opinions would be helpful.
{code}
@@ -115,7 +115,7 @@ public boolean yield() {
     // unlock
     dir.readUnlock();
-    fsn.readUnlock();
+    fsn.readUnlock("contentSummary");
{code}
# {{getBlockLocations}} currently appears as {{open}} in audit logging. It 
could be an existing bug. But since it has been audit logged that way it 
doesn't make sense to change here. I'll create a separate JIRA to discuss.
# {{writeUnlock("clearCorruptLazyPersistFile");}} should be 
"clearCorruptLazyPersistFiles"
# Maybe the {{checkLease}} category should be {{leasesMonitor}}? There's a 
one-off {{checkLease()}} method in FSN
# The one in {{NamenodeFsck#getBlockLocations}}, maybe we should use 
"fsckGetBlockLocations" to differentiate from regular {{getBlockLocations}} RPC 
call?

> Add MutableRate metrics for FSNamesystemLock operations
> -------------------------------------------------------
>
>                 Key: HDFS-10872
>                 URL: https://issues.apache.org/jira/browse/HDFS-10872
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>         Attachments: FSLockPerf.java, HDFS-10872.000.patch, 
> HDFS-10872.001.patch, HDFS-10872.002.patch
>
>
> Add metrics for FSNamesystemLock operations to see, overall, how long each 
> operation is holding the lock for. Use MutableRate metrics for now. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to