[ https://issues.apache.org/jira/browse/HADOOP-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875426#action_12875426 ]
Todd Lipcon commented on HADOOP-6762: ------------------------------------- re timeout: I'm a little nervous about such a change in the semantics of IPC at this point. The ping system ensures that the other side isn't completely dead, so some people use IPCs that are *supposed* to take a really long time, and rely on ping to know it's at least still connected. Maybe if you find it useful you could introduce a new parameter for the IPC timeout, and have it default to 0 (no timeout?) I could also see a situation where we wait for the ping time, and then print a LOG.warn("IPC call Protocol.callName to <IP> still waiting after 60000ms") once every ping interval. This would help debugging without changing behavior. (I too have often wished for such a thing) > exception while doing RPC I/O closes channel > -------------------------------------------- > > Key: HADOOP-6762 > URL: https://issues.apache.org/jira/browse/HADOOP-6762 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 0.20.2 > Reporter: sam rash > Assignee: sam rash > Attachments: hadoop-6762-1.txt, hadoop-6762-2.txt, hadoop-6762-3.txt, > hadoop-6762-4.txt, hadoop-6762-6.txt > > > If a single process creates two unique fileSystems to the same NN using > FileSystem.newInstance(), and one of them issues a close(), the leasechecker > thread is interrupted. This interrupt races with the rpc namenode.renew() > and can cause a ClosedByInterruptException. This closes the underlying > channel and the other filesystem, sharing the connection will get errors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.