[
https://issues.apache.org/jira/browse/HDFS-13347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414027#comment-16414027
]
Rushabh S Shah edited comment on HDFS-13347 at 3/26/18 3:54 PM:
----------------------------------------------------------------
+1 for this idea too. Thinking whether we should include in non-router based
service also.
We had many outages in our production cluster where some of the datanode's
resources (disk, cpu, network) were pegged and some datanodes were not able to
heartbeat to namenode for minutes.
Since the resources are shared with Yarn, sometimes it was inefficient yarn job
and sometimes it was locking within the datanode.
NamenodeUI#datanode page would tell us which datanode will soon be dead to
namenode by "last Contact time" column and we then go to that datanode and take
jstacks or do some investigation to find some bad jobs if any.
But if you cache the results for 10 minutes then you won't be able to get those
details until its too late.
I would suggest have some query parameter in url which, if true, will get the
cached results if the cache is not stale and if false it will query namenode
for most recent report.
This comment is purely a suggestion and by no means is blocking the last patch.
Since this feature just impacts the Routerbased service, you can ignore my
comment also.
was (Author: shahrs87):
+1 for this idea too. Thinking whether we should include in non-router based
service also.
We had many outages in our production cluster where some of the datanode's
resources (disk, cpu, network) were pegged and randomly some datanodes were not
able to heartbeat to namenode for minutes.
Since the resources are shared with Yarn, sometimes it was inefficient yarn job
and sometimes it was locking within the datanode.
NamenodeUI#datanode page would tell us which datanode will soon be dead to
namenode by "last Contact time" column and we then go to that datanode and take
jstacks or do some investigation to find some bad jobs if any.
But if you cache the results for 10 minutes then you won't be able to get those
details until its too late.
I would suggest have some query parameter in url which, if true, will get the
cached results if the cache is not stale and if false it will query namenode
for most recent report.
This comment is purely a suggestion and by no means is blocking the last patch.
Since this feature just impacts the Routerbased service, you can ignore my
comment also.
> RBF: Cache datanode reports
> ---------------------------
>
> Key: HDFS-13347
> URL: https://issues.apache.org/jira/browse/HDFS-13347
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Íñigo Goiri
> Assignee: Íñigo Goiri
> Priority: Minor
> Attachments: HDFS-13347.000.patch
>
>
> Getting the datanode reports is an expensive operation and can be executed
> very frequently by the UI and watchdogs. We should cache this information.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]