[
https://issues.apache.org/jira/browse/HDFS-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666799#comment-13666799
]
Devaraj Das commented on HDFS-4754:
-----------------------------------
Some comments:
One overall question - do we really need the stale duration from the client.
Could we just mark the datanode stale immediately and take it out of the stale
state when we get a heartbeat.
Some other patch level comments:
0. Should the markStale return a int signifying what's the max value is as well
for the stale duration (that way the client can adjust itself if needed).. not
a major one though.
1. The exception msg in DFSClient could be improved a bit :-)
2. On isStale(ConcurrentMap, long), you should update the javadoc to reflect
the new changes in the method implementation. isStale probably should belong to
DatanodeManager. See if it makes sense to you.
3. May not be immediately required, but at some point, we should probably lump
all the stuff to do with "stale" in a class and pass that around in the methods
(like in BlockPlacement* classes). Would ease readability.
4. Just a note - setting the config dfs.namenode.stale.mark.max.duration to 0,
effectively disables this API from taking any effect. Good..
5. Just wondering - if the NameNode's RPC queue is long, and getting to the RPC
for markStale takes long, the DNs would be marked stale in a different window
of time than the one the client originally intended. We could fix this by
having synced times in the cluster and passing client's view of the current
time to the namenode; the namenode could make some corrections in the duration
just before marking the datanode stale...
6. You say that after the desired duration for remaining stale the namenode
would rely on it's view whether a datanode is stale or not. I am wondering if
we should cap the max duration for the user controlled stale state to be the
configured value of the namenode's configured interval for staleness based on
heartbeat (and not have the new configuration you introduced) ..
> Add an API in the namenode to mark a datanode as stale
> ------------------------------------------------------
>
> Key: HDFS-4754
> URL: https://issues.apache.org/jira/browse/HDFS-4754
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client, namenode
> Reporter: Nicolas Liochon
> Assignee: Nicolas Liochon
> Priority: Critical
> Attachments: 4754.v1.patch
>
>
> There is a detection of the stale datanodes in HDFS since HDFS-3703, with a
> timeout, defaulted to 30s.
> There are two reasons to add an API to mark a node as stale even if the
> timeout is not yet reached:
> 1) ZooKeeper can detect that a client is dead at any moment. So, for HBase,
> we sometimes start the recovery before a node is marked staled. (even with
> reasonable settings as: stale: 20s; HBase ZK timeout: 30s
> 2) Some third parties could detect that a node is dead before the timeout,
> hence saving us the cost of retrying. An example or such hw is Arista,
> presented here by [~tsuna]
> http://tsunanet.net/~tsuna/fsf-hbase-meetup-april13.pdf, and confirmed in
> HBASE-6290.
> As usual, even if the node is dead it can comeback before the 10 minutes
> limit. So I would propose to set a timebound. The API would be
> namenode.markStale(String ipAddress, int port, long durationInMs);
> After durationInMs, the namenode would again rely only on its heartbeat to
> decide.
> Thoughts?
> If there is no objections, and if nobody in the hdfs dev team has the time to
> spend some time on it, I will give it a try for branch 2 & 3.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira