Re: client hang in com.sun.jini.jeri.internal.mux.Mux.start()

Dan Creswell Tue, 03 May 2011 00:29:36 -0700

Mmm, agreed though I think to do it requires that Chris get to the
root cause that is producing the need for a client timeout.


On 2 May 2011 23:17, Gregg Wonderly <[email protected]> wrote:
> Yes I think a timeout is what will work the best.  I just want to make sure 
> we can release any associated server resources at the same time so that this 
> doesn't result in a resource leak in the server.
>
> Gregg
>
> Sent from my iPhone
>
> On May 2, 2011, at 11:27 AM, "Christopher Dolan" <[email protected]> 
> wrote:
>
>> I strongly agree  with Gregg's comments about timeouts in general, but I
>> think this might be a special case. In the Mux.start() data flow, the
>> client sends an 8-byte handshake and expects the server to respond with
>> a similar 8-byte handshake (or error) promptly. I'm seeing indefinite
>> stalls in real-world cases, so I need to do something and a timeout is
>> the only solution I can think of.
>>
>> Chris
>>
>> -----Original Message-----
>> From: Gregg Wonderly [mailto:[email protected]]
>> Sent: Saturday, April 30, 2011 10:03 AM
>> To: [email protected]
>> Cc: Mike McGrady
>> Subject: Re: client hang in com.sun.jini.jeri.internal.mux.Mux.start()
>>
>> In the history of original Java threading model, and the NIO development
>> to use
>> "select/poll" from your own thread, rather than registering call back
>> methods
>> (via an interface) kept a lot of development from using a model where
>> threading
>> was managed internally by the package or by the JVM.  As a result, we
>> have
>> structures like today where notifications are less common.  In this code
>> though,
>> I think the structure is internal enough that it's not necessary to
>> really use
>> Future or some other mechanisms.
>>
>> Timeouts are always a "hard way" to manage "loss of functionality"
>> because you
>> really don't know when things are "not working", only that something is
>> taking
>> longer than your timeout accounted for.   Timeouts can make it possible
>> for more
>> pending work to pile up on the other end that might slow the results
>> even more.
>>  E.g. you wait 30seconds and retry while the actual work on the other
>> end is
>> taking 35 seconds to get to and thus the queue rate exceeds the dequeue
>> rate and
>> things start piling up.
>>
>> If you are going to use a timeout, we need to have some sort of
>> indication from
>> both ends perspective that the attempt has been aborted, as early as
>> possible.
>> For cases where I/O traffic is written/read, that usually means closing
>> the
>> socket.  I'm not sure of the ramifications of doing that, since I
>> haven't looked
>> too hard at this code.
>>
>> Gregg Wonderly
>>
>> On 4/29/2011 4:38 PM, Mike McGrady wrote:
>>> Throwing my two cents in here just to state my opinion.  This is an
>> effort that
>>> could pay dividends if it were done with a view toward the future -
>> absolutely
>>> no pun intended.  I do not know the details of the code but if
>> futures could be
>>> useful here, they would be welcomed by myself.
>>>
>>> MG
>>>
>>>
>>> On Apr 29, 2011, at 1:33 PM, Christopher Dolan wrote:
>>>
>>>> Thanks, Tom.
>>>>
>>>> I don't really understand your Future suggestion.  Are you suggesting
>>>> changing the async handshake to a Future? If so, that sounds like a
>> very
>>>> involved change, touching a lot of code in Mux and its subclasses.
>>>>
>>>> setDown() changes the value of the muxDown boolean to true, so it's a
>>>> valid way out of the loop.
>>>>
>>>> Chris
>>>>
>>>> -----Original Message-----
>>>> From: Tom Hobbs [mailto:[email protected]]
>>>> Sent: Friday, April 29, 2011 2:27 PM
>>>> To: [email protected]
>>>> Subject: Re: client hang in
>> com.sun.jini.jeri.internal.mux.Mux.start()
>>>>
>>>> The proposed code looks fine to me.  Two points leap out, more
>>>> discussion
>>>> points than anything else.
>>>>
>>>> For some reason I've recently developed an aversion to writing my on
>>>> timeout
>>>> logic.  did you consider using something like a Future here or might
>>>> that be
>>>> serious overkill (it wouldn't surprise me if it was)?
>>>>
>>>> Also is Setdown intended to break out of the while loop?  Because I
>>>> can't
>>>> see a way to escape it.  (I don't have the rest of the code in front
>> of
>>>> me)
>>>>
>>>> Thanks for keep raising these issues - especially because you usually
>>>> supply
>>>> fixes for them!
>>>>
>>>> Tom
>>>>
>>>> On 29 Apr 2011 19:41, "Christopher Dolan"<[email protected]>
>>>> wrote:
>>>>> I've experienced occasional cases where clients get stuck in the
>>>>> following block of code in Mux.start. Has anyone experienced this
>>>>> problem? I have a proposed solution below. Has anyone thought about
>> a
>>>>> similar solution already?
>>>>>
>>>>> -- Current code --
>>>>> 1 asyncSendClientConnectionHeader();
>>>>> 2 synchronized (muxLock) {
>>>>> 3 while (!muxDown&&  !clientConnectionReady) {
>>>>> 4 try {
>>>>> 5 muxLock.wait(); // REMIND: timeout?
>>>>> 6 } catch (InterruptedException e) {
>>>>> 7 ...
>>>>> 8 }
>>>>> 9 }
>>>>> 10 if (muxDown) {
>>>>> 11 IOException ioe = new IOException(muxDownMessage);
>>>>> 12 ioe.initCause(muxDownCause);
>>>>> 13 throw ioe;
>>>>> 14 }
>>>>> 15 }
>>>>>
>>>>> -- Explanation of the code --
>>>>> This code handles the initial client-server handshake that starts a
>>>> JERI
>>>>> connection. In line 1, the client sends its 8-byte greeting to the
>>>>> server. Then in the loop on lines 3-9, it waits for the server's
>>>>> response. If the reader thread gets a satisfactory response from the
>>>>> server, it sets clientConnectionReady=true and calls
>>>>> muxLock.notifyAll(). In all other cases (aborted connection,
>>>> mismatched
>>>>> protocol version, etc) the reader invokes Mux.setDown() which sets
>>>>> muxDown=true and calls muxLock.notifyAll(). In lines 10-14, it
>> throws
>>>> if
>>>>> the handshake was a failure.
>>>>>
>>>>> In my scenario (which uses simple TCP sockets, nothing fancy), the
>>>>> invoker thread sits on line 5 indefinitely. My problem hard to
>>>>> reproduce, so I haven't found out what the server is doing in this
>>>> case.
>>>>> I hope to figure that out eventually, but presently I'm interested
>> in
>>>>> the "REMIND: timeout?" comment.
>>>>>
>>>>> -- Timeout solution --
>>>>> It seems obvious to me that there should be a timeout here. There
>> are
>>>>> lots of imaginable cases where the client could get stuck here:
>>>>> server-side deadlock, abrupt server crash, logic error in client Mux
>>>>> code. You'd expect that the server would either respond with its
>>>> 8-byte
>>>>> handshake very quickly or never, so a modest timeout (like 15 or 30
>>>>> seconds) should be good. If that timeout is triggered, I would
>> expect
>>>>> that the code above would call Mux.setDown() and throw an
>> IOException.
>>>>> That exception would either cause a retry or be thrown up to the
>>>> invoker
>>>>> as a RemoteException.
>>>>>
>>>>> -- Proposed code (untested) --
>>>>> 3 long now = System.currentTimeMillis();
>>>>> 4 long endTime = now + timeoutMillis;
>>>>> 5 while (!muxDown&&  !clientConnectionReady) {
>>>>> 6 if (now>= endTime) {
>>>>> 7 setDown("timeout waiting for server to respond
>>>>> to handshake", null);
>>>>> 8 } else {
>>>>> 9 try {
>>>>> 10 muxLock.wait(endTime - now);
>>>>> 11 now = System.currentTimeMillis();
>>>>> 12 } catch (InterruptedException e) {
>>>>> 13 setDown("interrupt waiting for connection
>>>>> header", e);
>>>>> 14 }
>>>>> 15 }
>>>>> 16 }
>>>>>
>>>>> This code assumes a configurable timeoutMillis parameter has been
>> set
>>>>> earlier.
>>>>>
>>>>> I can't think of any alternative solutions. Putting the timeout in
>> the
>>>>> Reader logic seems higher risk. There's incomplete code in JERI to
>>>>> implement a ping packet (see Mux.asyncSendPing, never used), but
>> that
>>>>> would only be relevant after the initial handshake and wouldn't help
>>>>> here.
>>>>>
>>>>> Thanks,
>>>>> Chris
>>>>>
>>>
>>> Michael McGrady
>>> Chief Systems Architect
>>> Topia Technology, Inc.
>>> Cel 1.253.720.3365
>>> Work 1.253.572.9712 extension 2037
>>> Skype ID: michael.mcgrady5
>>> [email protected]
>>>
>>>
>>>
>>>
>>
>

Re: client hang in com.sun.jini.jeri.internal.mux.Mux.start()

Reply via email to