haiyang1987 commented on PR #6266:
URL: https://github.com/apache/hadoop/pull/6266#issuecomment-1818874038

   > `InterruptedIOException` is a subClass of `IOException`. In the try_catch 
block in `getActiveNodeProxy`, we do catch for `IOException`. So, 
interruptedIOException will be captured there.
   > 
   > 
https://docs.oracle.com/javase/8/docs/api/java/io/InterruptedIOException.html
   > 
   > What I think should be happening is as following.
   > 
   > ```
   > main_Thread calling `triggerActiveLogRoll`, wait for 60 secs, timeout, 
cancel this task, and return. 
   > 
   > MultipleNameNodeProxy.call() thread: 
   >    -> getActiveNodeProxy()
   >         -> nnLookup.next = ob2 (down node)
   >         -> RPC.waitForProxy(ob2)
   >         -> after 60 secs, interrupted.  
   >         -> output "Failed to reach ob2", increment nnLoopCount. Ideally, 
we should just stop here, since we already time out.
   >         -> nnLookup.next = n1 (live node). 
   >              then, it should succeed to connect to n1.
   > ```
   > 
   > This does not seem to be the case from the logs you shared. thoughts?
   > 
   > A possible fix might be to explicitly capture `InterruptedIOException` in 
`getActiveNodeProxy`, and just finish for this thread (assuming all 
`InterruptedIOExceptions` are invoked from `triggerActiveLogRoll`). For the 
following triggerActiveLogRoll calls, we should be good, since we will move the 
nnLookup to next one.
   
   Yeah, For MultipleNameNodeProxy#call() explicitly capture 
InterruptedIOException and then exit execution is also a solution


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to