I think maybe the right tradeoff is to firstly figure out root cause/fix the problem and then later, as part of a general move towards JDK 5 re-structure to make use of "Java 5 timeout stuff".
My basic concern would be we would otherwise run the risk of introducing complications/more bugs whilst we try and sort the original. On 3 May 2011 09:12, Tom Hobbs <[email protected]> wrote: > Dan raises a good point, which is probably of higher importance than the one > I did. > > My only comment about the use of a timeout was questioning whether we could > start to use some Java 5 timeout stuff. You guys have been looking at that > bit of the code more than me, so if the answer is "no" then I can accept > that. > > On 3 May 2011 08:29, "Dan Creswell" <[email protected]> wrote: >> Mmm, agreed though I think to do it requires that Chris get to the >> root cause that is producing the need for a client timeout. >> >> On 2 May 2011 23:17, Gregg Wonderly <[email protected]> wrote: >>> Yes I think a timeout is what will work the best. I just want to make > sure we can release any associated server resources at the same time so that > this doesn't result in a resource leak in the server. >>> >>> Gregg >>> >>> Sent from my iPhone >>> >>> On May 2, 2011, at 11:27 AM, "Christopher Dolan" < > [email protected]> wrote: >>> >>>> I strongly agree with Gregg's comments about timeouts in general, but I >>>> think this might be a special case. In the Mux.start() data flow, the >>>> client sends an 8-byte handshake and expects the server to respond with >>>> a similar 8-byte handshake (or error) promptly. I'm seeing indefinite >>>> stalls in real-world cases, so I need to do something and a timeout is >>>> the only solution I can think of. >>>> >>>> Chris >>>> >>>> -----Original Message----- >>>> From: Gregg Wonderly [mailto:[email protected]] >>>> Sent: Saturday, April 30, 2011 10:03 AM >>>> To: [email protected] >>>> Cc: Mike McGrady >>>> Subject: Re: client hang in com.sun.jini.jeri.internal.mux.Mux.start() >>>> >>>> In the history of original Java threading model, and the NIO development >>>> to use >>>> "select/poll" from your own thread, rather than registering call back >>>> methods >>>> (via an interface) kept a lot of development from using a model where >>>> threading >>>> was managed internally by the package or by the JVM. As a result, we >>>> have >>>> structures like today where notifications are less common. In this code >>>> though, >>>> I think the structure is internal enough that it's not necessary to >>>> really use >>>> Future or some other mechanisms. >>>> >>>> Timeouts are always a "hard way" to manage "loss of functionality" >>>> because you >>>> really don't know when things are "not working", only that something is >>>> taking >>>> longer than your timeout accounted for. Timeouts can make it possible >>>> for more >>>> pending work to pile up on the other end that might slow the results >>>> even more. >>>> E.g. you wait 30seconds and retry while the actual work on the other >>>> end is >>>> taking 35 seconds to get to and thus the queue rate exceeds the dequeue >>>> rate and >>>> things start piling up. >>>> >>>> If you are going to use a timeout, we need to have some sort of >>>> indication from >>>> both ends perspective that the attempt has been aborted, as early as >>>> possible. >>>> For cases where I/O traffic is written/read, that usually means closing >>>> the >>>> socket. I'm not sure of the ramifications of doing that, since I >>>> haven't looked >>>> too hard at this code. >>>> >>>> Gregg Wonderly >>>> >>>> On 4/29/2011 4:38 PM, Mike McGrady wrote: >>>>> Throwing my two cents in here just to state my opinion. This is an >>>> effort that >>>>> could pay dividends if it were done with a view toward the future - >>>> absolutely >>>>> no pun intended. I do not know the details of the code but if >>>> futures could be >>>>> useful here, they would be welcomed by myself. >>>>> >>>>> MG >>>>> >>>>> >>>>> On Apr 29, 2011, at 1:33 PM, Christopher Dolan wrote: >>>>> >>>>>> Thanks, Tom. >>>>>> >>>>>> I don't really understand your Future suggestion. Are you suggesting >>>>>> changing the async handshake to a Future? If so, that sounds like a >>>> very >>>>>> involved change, touching a lot of code in Mux and its subclasses. >>>>>> >>>>>> setDown() changes the value of the muxDown boolean to true, so it's a >>>>>> valid way out of the loop. >>>>>> >>>>>> Chris >>>>>> >>>>>> -----Original Message----- >>>>>> From: Tom Hobbs [mailto:[email protected]] >>>>>> Sent: Friday, April 29, 2011 2:27 PM >>>>>> To: [email protected] >>>>>> Subject: Re: client hang in >>>> com.sun.jini.jeri.internal.mux.Mux.start() >>>>>> >>>>>> The proposed code looks fine to me. Two points leap out, more >>>>>> discussion >>>>>> points than anything else. >>>>>> >>>>>> For some reason I've recently developed an aversion to writing my on >>>>>> timeout >>>>>> logic. did you consider using something like a Future here or might >>>>>> that be >>>>>> serious overkill (it wouldn't surprise me if it was)? >>>>>> >>>>>> Also is Setdown intended to break out of the while loop? Because I >>>>>> can't >>>>>> see a way to escape it. (I don't have the rest of the code in front >>>> of >>>>>> me) >>>>>> >>>>>> Thanks for keep raising these issues - especially because you usually >>>>>> supply >>>>>> fixes for them! >>>>>> >>>>>> Tom >>>>>> >>>>>> On 29 Apr 2011 19:41, "Christopher Dolan"<[email protected]> >>>>>> wrote: >>>>>>> I've experienced occasional cases where clients get stuck in the >>>>>>> following block of code in Mux.start. Has anyone experienced this >>>>>>> problem? I have a proposed solution below. Has anyone thought about >>>> a >>>>>>> similar solution already? >>>>>>> >>>>>>> -- Current code -- >>>>>>> 1 asyncSendClientConnectionHeader(); >>>>>>> 2 synchronized (muxLock) { >>>>>>> 3 while (!muxDown&& !clientConnectionReady) { >>>>>>> 4 try { >>>>>>> 5 muxLock.wait(); // REMIND: timeout? >>>>>>> 6 } catch (InterruptedException e) { >>>>>>> 7 ... >>>>>>> 8 } >>>>>>> 9 } >>>>>>> 10 if (muxDown) { >>>>>>> 11 IOException ioe = new IOException(muxDownMessage); >>>>>>> 12 ioe.initCause(muxDownCause); >>>>>>> 13 throw ioe; >>>>>>> 14 } >>>>>>> 15 } >>>>>>> >>>>>>> -- Explanation of the code -- >>>>>>> This code handles the initial client-server handshake that starts a >>>>>> JERI >>>>>>> connection. In line 1, the client sends its 8-byte greeting to the >>>>>>> server. Then in the loop on lines 3-9, it waits for the server's >>>>>>> response. If the reader thread gets a satisfactory response from the >>>>>>> server, it sets clientConnectionReady=true and calls >>>>>>> muxLock.notifyAll(). In all other cases (aborted connection, >>>>>> mismatched >>>>>>> protocol version, etc) the reader invokes Mux.setDown() which sets >>>>>>> muxDown=true and calls muxLock.notifyAll(). In lines 10-14, it >>>> throws >>>>>> if >>>>>>> the handshake was a failure. >>>>>>> >>>>>>> In my scenario (which uses simple TCP sockets, nothing fancy), the >>>>>>> invoker thread sits on line 5 indefinitely. My problem hard to >>>>>>> reproduce, so I haven't found out what the server is doing in this >>>>>> case. >>>>>>> I hope to figure that out eventually, but presently I'm interested >>>> in >>>>>>> the "REMIND: timeout?" comment. >>>>>>> >>>>>>> -- Timeout solution -- >>>>>>> It seems obvious to me that there should be a timeout here. There >>>> are >>>>>>> lots of imaginable cases where the client could get stuck here: >>>>>>> server-side deadlock, abrupt server crash, logic error in client Mux >>>>>>> code. You'd expect that the server would either respond with its >>>>>> 8-byte >>>>>>> handshake very quickly or never, so a modest timeout (like 15 or 30 >>>>>>> seconds) should be good. If that timeout is triggered, I would >>>> expect >>>>>>> that the code above would call Mux.setDown() and throw an >>>> IOException. >>>>>>> That exception would either cause a retry or be thrown up to the >>>>>> invoker >>>>>>> as a RemoteException. >>>>>>> >>>>>>> -- Proposed code (untested) -- >>>>>>> 3 long now = System.currentTimeMillis(); >>>>>>> 4 long endTime = now + timeoutMillis; >>>>>>> 5 while (!muxDown&& !clientConnectionReady) { >>>>>>> 6 if (now>= endTime) { >>>>>>> 7 setDown("timeout waiting for server to respond >>>>>>> to handshake", null); >>>>>>> 8 } else { >>>>>>> 9 try { >>>>>>> 10 muxLock.wait(endTime - now); >>>>>>> 11 now = System.currentTimeMillis(); >>>>>>> 12 } catch (InterruptedException e) { >>>>>>> 13 setDown("interrupt waiting for connection >>>>>>> header", e); >>>>>>> 14 } >>>>>>> 15 } >>>>>>> 16 } >>>>>>> >>>>>>> This code assumes a configurable timeoutMillis parameter has been >>>> set >>>>>>> earlier. >>>>>>> >>>>>>> I can't think of any alternative solutions. Putting the timeout in >>>> the >>>>>>> Reader logic seems higher risk. There's incomplete code in JERI to >>>>>>> implement a ping packet (see Mux.asyncSendPing, never used), but >>>> that >>>>>>> would only be relevant after the initial handshake and wouldn't help >>>>>>> here. >>>>>>> >>>>>>> Thanks, >>>>>>> Chris >>>>>>> >>>>> >>>>> Michael McGrady >>>>> Chief Systems Architect >>>>> Topia Technology, Inc. >>>>> Cel 1.253.720.3365 >>>>> Work 1.253.572.9712 extension 2037 >>>>> Skype ID: michael.mcgrady5 >>>>> [email protected] >>>>> >>>>> >>>>> >>>>> >>>> >>> >
