[jira] [Comment Edited] (HDFS-13347) RBF: Cache datanode reports

Rushabh S Shah (JIRA) Mon, 26 Mar 2018 09:01:41 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-13347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414027#comment-16414027
 ]


Rushabh S Shah edited comment on HDFS-13347 at 3/26/18 3:54 PM:
----------------------------------------------------------------

+1 for this idea too. Thinking whether we should include in non-router based 
service also.
We had many outages in our production cluster where some of the datanode's 
resources (disk, cpu, network) were pegged and some datanodes were not able to 
heartbeat to namenode for minutes.
Since the resources are shared with Yarn, sometimes it was inefficient yarn job 
and sometimes it was locking within the datanode.
NamenodeUI#datanode page would tell us which datanode will soon be dead to 
namenode by "last Contact time" column and we then go to that datanode and take 
jstacks or do some investigation to find some bad jobs if any.
But if you cache the results for 10 minutes then you won't be able to get those 
details until its too late.
I would suggest have some query parameter in url which, if true, will get the 
cached results if the cache is not stale and if false it will query namenode 
for most recent report.
This comment is purely a suggestion and by no means is blocking the last patch.
Since this  feature just impacts the Routerbased service, you can ignore my 
comment also.


was (Author: shahrs87):
+1 for this idea too. Thinking whether we should include in non-router based 
service also.
We had many outages in our production cluster where some of the datanode's 
resources (disk, cpu, network) were pegged and randomly some datanodes were not 
able to heartbeat to namenode for minutes.
Since the resources are shared with Yarn, sometimes it was inefficient yarn job 
and sometimes it was locking within the datanode.
NamenodeUI#datanode page would tell us which datanode will soon be dead to 
namenode by "last Contact time" column and we then go to that datanode and take 
jstacks or do some investigation to find some bad jobs if any.
But if you cache the results for 10 minutes then you won't be able to get those 
details until its too late.
I would suggest have some query parameter in url which, if true, will get the 
cached results if the cache is not stale and if false it will query namenode 
for most recent report.
This comment is purely a suggestion and by no means is blocking the last patch.
Since this  feature just impacts the Routerbased service, you can ignore my 
comment also.

> RBF: Cache datanode reports
> ---------------------------
>
>                 Key: HDFS-13347
>                 URL: https://issues.apache.org/jira/browse/HDFS-13347
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Íñigo Goiri
>            Assignee: Íñigo Goiri
>            Priority: Minor
>         Attachments: HDFS-13347.000.patch
>
>
> Getting the datanode reports is an expensive operation and can be executed 
> very frequently by the UI and watchdogs. We should cache this information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-13347) RBF: Cache datanode reports

Reply via email to