[ 
https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650070#comment-14650070
 ] 

Yi Liu commented on HDFS-6682:
------------------------------

Thanks Allen, Andrew and Akira for the discussion.

Our original intention is to solve issue which is good, thank you for working 
on it.  About the discussion itself, Andrew's suggestion is good, and another 
option is to record latest time of 
{{UnderReplicatedBlocks#chooseUnderReplicatedBlocks}}, and we already have 
metrics about the 
{{underReplicatedBlocksCount/pendingReplicationBlocksCount/scheduledReplicationBlocksCount}},
 so we can know whether/how long the under replica list is handled since last 
time if we really want to see.   My point is not worth to record whole under 
replicated list for this metric.

On way other hand, we have {{UnderReplicatedBlocks}} and 
{{PendingReplicationBlocks}}, right? Replication monitor thread will 
periodically pick up some under replicated blocks, unless the NN stops (e.g, 
full gc), compute replication work will always happen in some CPU time slice, 
of course it could be slow since there maybe many things need to be handled in 
NN (e.g. many requests). But actually if NN is slow, we have many ways to know 
it.  About Akira's comment about the metric is also about the entire HDFS 
cluster, we talk DataNode here, I think more correctly thing it's to record the 
timeout number of pending replication blocks ({{PendingReplicationBlocks}}) if 
network is very busy or target DNs corrupted if we want to get the Cluster 
health from replication blocks' review,   {{UnderReplicatedBlocks}} can't stand 
for that.

So if we want to have some metrics about the replicated blocks in NN, let's 
find some lightweight way as suggested, thanks.


> Add a metric to expose the timestamp of the oldest under-replicated block
> -------------------------------------------------------------------------
>
>                 Key: HDFS-6682
>                 URL: https://issues.apache.org/jira/browse/HDFS-6682
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Akira AJISAKA
>            Assignee: Akira AJISAKA
>              Labels: metrics
>         Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, 
> HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch
>
>
> In the following case, the data in the HDFS is lost and a client needs to put 
> the same file again.
> # A Client puts a file to HDFS
> # A DataNode crashes before replicating a block of the file to other DataNodes
> I propose a metric to expose the timestamp of the oldest 
> under-replicated/corrupt block. That way client can know what file to retain 
> for the re-try.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to