[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979453#comment-16979453
 ] 

Íñigo Goiri commented on HDFS-14997:
------------------------------------

We should add this to the metrics to be able to track if we are queueing too 
many commands, etc.
Other minor comments:
* We should add a javadoc to the methods in CommandProcessingThread. Specially 
to processQueue() evne though is private.
* For processQueue(), I prefer while instead of do/while.
* processQueue() should use {{numProcessCommands++}}. Let's also make it 
{{numProcessedCommands}}.
* Do we need to do take and then poll?
* In the interrupted case, we should log with debug in the other cases (use 
also the logger {} format). If it is interrupted, shouldn't shouldRun() return 
false so no need to break?
* We should extend the CommandProcessingThread #enqueue() to support 
{{List<DatanodeCommand>}} and {{DatanodeCommand}} as arguments so we don't need 
to do the transformations in the part where we add it.
* {{processCommand(DatanodeCommand[] cmds)}} is kind of repeated now. Should we 
merge the new and the old together?

> BPServiceActor process command from NameNode asynchronously
> -----------------------------------------------------------
>
>                 Key: HDFS-14997
>                 URL: https://issues.apache.org/jira/browse/HDFS-14997
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Xiaoqiao He
>            Assignee: Xiaoqiao He
>            Priority: Major
>         Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to