[
https://issues.apache.org/jira/browse/HDFS-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977420#comment-16977420
]
Yiqun Lin commented on HDFS-14651:
----------------------------------
Thanks for addressing the comment, [~leosun08]. Some review comments:
{code}
private static long CONNECTION_TIMEOUT_MS = 20 * 1000; // 20ms
long DFS_CLIENT_DEAD_NODE_DETECTION_INTERVAL_DEFAULT = 60 * 1000; // 60ms
{code}
Here the comment should be 20s, 60s.
{code}
this.deadNodesProbeQueue = new ArrayBlockingQueue(maxDeadNodesProbeQueueLen);
{code}
Can we update this to new
ArrayBlockingQueue<DatanodeInfo>(maxDeadNodesProbeQueueLen);
I think we can simplify this
{code}
+ private void checkDeadNodes() {
+ ...
+ for (DatanodeInfo datanodeInfo : datanodeInfos) {
+ LOG.debug("Add dead node to check: {}.", datanodeInfo);
+ if (isDeadNodesProbeQueueFull()) {
+ break;
+ }
+ addToDeadNodesProbeQueue(datanodeInfo);
+ ...
+ }
{code}
to
{code}
for (DatanodeInfo datanodeInfo : datanodeInfos) {
LOG.debug("Add dead node to check: {}.", datanodeInfo);
if(!addToDeadNodesProbeQueue(datanodeInfo)) {
LOG.debug("Skip to add dead node {} to check since the proboe
queue is full.", datanodeInfo);
break;
}
}
{code}
And following three methods can be removed
* addToDeadNodesProbeQueue
* isDeadNodesProbeQueueFull
* pollFromDeadNodesProbeQueue
{code}
private void removeFromDead(DatanodeInfo datanodeInfo) {
deadNodes.remove(datanodeInfo.getDatanodeUuid());
}
{code}
When we remove one dead node, can we also remove the dead node from
dfsInputStreamNodes and DFSInputStream local dead nodes? DFSInputStream local
dead will hold stale dead node but actually we check that this is not dead node
now.
{code}
+ /**
+ * Start probe dead node thread.
+ */
+ private void startProbeScheduler() {
+ probeDeadNodesSchedulerThr =
+ new Thread(new ProbeScheduler(this, ProbeType.CHECK_DEAD));
+ probeDeadNodesSchedulerThr.setDaemon(true);
+ probeDeadNodesSchedulerThr.start();
+ }
{code}
We should have corresponding close operations of detector and stop these
threads.
I noticed that here we only have one type CHECK_DEAD not other probe type like
suspicious node type? Do we still need to define this type? Seems it's okay
that we don't need to add suspicious state.
> DeadNodeDetector checks dead node periodically
> ----------------------------------------------
>
> Key: HDFS-14651
> URL: https://issues.apache.org/jira/browse/HDFS-14651
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Lisheng Sun
> Assignee: Lisheng Sun
> Priority: Major
> Attachments: HDFS-14651.001.patch, HDFS-14651.002.patch,
> HDFS-14651.003.patch
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]