[ 
https://issues.apache.org/jira/browse/HADOOP-16677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HADOOP-16677:
--------------------------------
    Summary: Recalculate the remaining timeout millis correctly while throwing 
an InterupptedException in SocketIOWithTimeout.  (was: The remaining timeout 
millis for constructing InterupptedException also need to be recalculated 
correctly when thread is interrupted while blocking in select().)

> Recalculate the remaining timeout millis correctly while throwing an 
> InterupptedException in SocketIOWithTimeout.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-16677
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16677
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common
>    Affects Versions: 3.1.3
>            Reporter: Xudong Cao
>            Assignee: Xudong Cao
>            Priority: Minor
>
> In SocketIOWithTimeout, when a thread was interrupted and exit from select(), 
> it proceed to throw an InterruptedIOException, in exception message the 
> remaining timeout mills should be recalcuated correctly rather than simply 
> give a total timeout millis,  otherwise it could be very misleading.
> For example, if an hdfs writer has not sent any packet to the pipeline more 
> than 60s (e.g. full gc or network issues),  then one of pipeline datanodes 
> may be timeout and close its sockets to other dns,  so its upstream DN's 
> PacketResponder will immediately meet an EOF and then interrupt its own 
> DataXceiver, finally its DataXeiver will print some logs like:
>  
> {code:java}
> 2019-10-24 09:22:58,212 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Exception for 
> BP-753871533-10.215.131.216-1511957392115:blk_10646544613_95736750382019-10-24
>  09:22:58,212 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception 
> for 
> BP-753871533-10.215.131.216-1511957392115:blk_10646544613_9573675038java.io.InterruptedIOException:
>  Interrupted while waiting for IO on channel 
> java.nio.channels.SocketChannel[connected local=/10.196.146.114:9003 
> remote=/10.215.153.105:38559]. 60000 millis timeout left. at 
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:342)
>  at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) 
> ...
>  
> {code}
>  
> This log is very misleading because a 60000 mills timeout left implies that 
> the DataXceiver never blocks in select(), and this is unrealistic.  in fact, 
> the truly timeout mills left should be: 60000 - timeElapsedWhenSelect.
> Finally, a properly log should be like this:
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to