Dan raises a good point, which is probably of higher importance than the one
I did.

My only comment about the use of a timeout was questioning whether we could
start to use some Java 5 timeout stuff.  You guys have been looking at that
bit of the code more than me, so if the answer is "no" then I can accept
that.

On 3 May 2011 08:29, "Dan Creswell" <[email protected]> wrote:
> Mmm, agreed though I think to do it requires that Chris get to the
> root cause that is producing the need for a client timeout.
>
> On 2 May 2011 23:17, Gregg Wonderly <[email protected]> wrote:
>> Yes I think a timeout is what will work the best.  I just want to make
sure we can release any associated server resources at the same time so that
this doesn't result in a resource leak in the server.
>>
>> Gregg
>>
>> Sent from my iPhone
>>
>> On May 2, 2011, at 11:27 AM, "Christopher Dolan" <
[email protected]> wrote:
>>
>>> I strongly agree  with Gregg's comments about timeouts in general, but I
>>> think this might be a special case. In the Mux.start() data flow, the
>>> client sends an 8-byte handshake and expects the server to respond with
>>> a similar 8-byte handshake (or error) promptly. I'm seeing indefinite
>>> stalls in real-world cases, so I need to do something and a timeout is
>>> the only solution I can think of.
>>>
>>> Chris
>>>
>>> -----Original Message-----
>>> From: Gregg Wonderly [mailto:[email protected]]
>>> Sent: Saturday, April 30, 2011 10:03 AM
>>> To: [email protected]
>>> Cc: Mike McGrady
>>> Subject: Re: client hang in com.sun.jini.jeri.internal.mux.Mux.start()
>>>
>>> In the history of original Java threading model, and the NIO development
>>> to use
>>> "select/poll" from your own thread, rather than registering call back
>>> methods
>>> (via an interface) kept a lot of development from using a model where
>>> threading
>>> was managed internally by the package or by the JVM.  As a result, we
>>> have
>>> structures like today where notifications are less common.  In this code
>>> though,
>>> I think the structure is internal enough that it's not necessary to
>>> really use
>>> Future or some other mechanisms.
>>>
>>> Timeouts are always a "hard way" to manage "loss of functionality"
>>> because you
>>> really don't know when things are "not working", only that something is
>>> taking
>>> longer than your timeout accounted for.   Timeouts can make it possible
>>> for more
>>> pending work to pile up on the other end that might slow the results
>>> even more.
>>>  E.g. you wait 30seconds and retry while the actual work on the other
>>> end is
>>> taking 35 seconds to get to and thus the queue rate exceeds the dequeue
>>> rate and
>>> things start piling up.
>>>
>>> If you are going to use a timeout, we need to have some sort of
>>> indication from
>>> both ends perspective that the attempt has been aborted, as early as
>>> possible.
>>> For cases where I/O traffic is written/read, that usually means closing
>>> the
>>> socket.  I'm not sure of the ramifications of doing that, since I
>>> haven't looked
>>> too hard at this code.
>>>
>>> Gregg Wonderly
>>>
>>> On 4/29/2011 4:38 PM, Mike McGrady wrote:
>>>> Throwing my two cents in here just to state my opinion.  This is an
>>> effort that
>>>> could pay dividends if it were done with a view toward the future -
>>> absolutely
>>>> no pun intended.  I do not know the details of the code but if
>>> futures could be
>>>> useful here, they would be welcomed by myself.
>>>>
>>>> MG
>>>>
>>>>
>>>> On Apr 29, 2011, at 1:33 PM, Christopher Dolan wrote:
>>>>
>>>>> Thanks, Tom.
>>>>>
>>>>> I don't really understand your Future suggestion.  Are you suggesting
>>>>> changing the async handshake to a Future? If so, that sounds like a
>>> very
>>>>> involved change, touching a lot of code in Mux and its subclasses.
>>>>>
>>>>> setDown() changes the value of the muxDown boolean to true, so it's a
>>>>> valid way out of the loop.
>>>>>
>>>>> Chris
>>>>>
>>>>> -----Original Message-----
>>>>> From: Tom Hobbs [mailto:[email protected]]
>>>>> Sent: Friday, April 29, 2011 2:27 PM
>>>>> To: [email protected]
>>>>> Subject: Re: client hang in
>>> com.sun.jini.jeri.internal.mux.Mux.start()
>>>>>
>>>>> The proposed code looks fine to me.  Two points leap out, more
>>>>> discussion
>>>>> points than anything else.
>>>>>
>>>>> For some reason I've recently developed an aversion to writing my on
>>>>> timeout
>>>>> logic.  did you consider using something like a Future here or might
>>>>> that be
>>>>> serious overkill (it wouldn't surprise me if it was)?
>>>>>
>>>>> Also is Setdown intended to break out of the while loop?  Because I
>>>>> can't
>>>>> see a way to escape it.  (I don't have the rest of the code in front
>>> of
>>>>> me)
>>>>>
>>>>> Thanks for keep raising these issues - especially because you usually
>>>>> supply
>>>>> fixes for them!
>>>>>
>>>>> Tom
>>>>>
>>>>> On 29 Apr 2011 19:41, "Christopher Dolan"<[email protected]>
>>>>> wrote:
>>>>>> I've experienced occasional cases where clients get stuck in the
>>>>>> following block of code in Mux.start. Has anyone experienced this
>>>>>> problem? I have a proposed solution below. Has anyone thought about
>>> a
>>>>>> similar solution already?
>>>>>>
>>>>>> -- Current code --
>>>>>> 1 asyncSendClientConnectionHeader();
>>>>>> 2 synchronized (muxLock) {
>>>>>> 3 while (!muxDown&&  !clientConnectionReady) {
>>>>>> 4 try {
>>>>>> 5 muxLock.wait(); // REMIND: timeout?
>>>>>> 6 } catch (InterruptedException e) {
>>>>>> 7 ...
>>>>>> 8 }
>>>>>> 9 }
>>>>>> 10 if (muxDown) {
>>>>>> 11 IOException ioe = new IOException(muxDownMessage);
>>>>>> 12 ioe.initCause(muxDownCause);
>>>>>> 13 throw ioe;
>>>>>> 14 }
>>>>>> 15 }
>>>>>>
>>>>>> -- Explanation of the code --
>>>>>> This code handles the initial client-server handshake that starts a
>>>>> JERI
>>>>>> connection. In line 1, the client sends its 8-byte greeting to the
>>>>>> server. Then in the loop on lines 3-9, it waits for the server's
>>>>>> response. If the reader thread gets a satisfactory response from the
>>>>>> server, it sets clientConnectionReady=true and calls
>>>>>> muxLock.notifyAll(). In all other cases (aborted connection,
>>>>> mismatched
>>>>>> protocol version, etc) the reader invokes Mux.setDown() which sets
>>>>>> muxDown=true and calls muxLock.notifyAll(). In lines 10-14, it
>>> throws
>>>>> if
>>>>>> the handshake was a failure.
>>>>>>
>>>>>> In my scenario (which uses simple TCP sockets, nothing fancy), the
>>>>>> invoker thread sits on line 5 indefinitely. My problem hard to
>>>>>> reproduce, so I haven't found out what the server is doing in this
>>>>> case.
>>>>>> I hope to figure that out eventually, but presently I'm interested
>>> in
>>>>>> the "REMIND: timeout?" comment.
>>>>>>
>>>>>> -- Timeout solution --
>>>>>> It seems obvious to me that there should be a timeout here. There
>>> are
>>>>>> lots of imaginable cases where the client could get stuck here:
>>>>>> server-side deadlock, abrupt server crash, logic error in client Mux
>>>>>> code. You'd expect that the server would either respond with its
>>>>> 8-byte
>>>>>> handshake very quickly or never, so a modest timeout (like 15 or 30
>>>>>> seconds) should be good. If that timeout is triggered, I would
>>> expect
>>>>>> that the code above would call Mux.setDown() and throw an
>>> IOException.
>>>>>> That exception would either cause a retry or be thrown up to the
>>>>> invoker
>>>>>> as a RemoteException.
>>>>>>
>>>>>> -- Proposed code (untested) --
>>>>>> 3 long now = System.currentTimeMillis();
>>>>>> 4 long endTime = now + timeoutMillis;
>>>>>> 5 while (!muxDown&&  !clientConnectionReady) {
>>>>>> 6 if (now>= endTime) {
>>>>>> 7 setDown("timeout waiting for server to respond
>>>>>> to handshake", null);
>>>>>> 8 } else {
>>>>>> 9 try {
>>>>>> 10 muxLock.wait(endTime - now);
>>>>>> 11 now = System.currentTimeMillis();
>>>>>> 12 } catch (InterruptedException e) {
>>>>>> 13 setDown("interrupt waiting for connection
>>>>>> header", e);
>>>>>> 14 }
>>>>>> 15 }
>>>>>> 16 }
>>>>>>
>>>>>> This code assumes a configurable timeoutMillis parameter has been
>>> set
>>>>>> earlier.
>>>>>>
>>>>>> I can't think of any alternative solutions. Putting the timeout in
>>> the
>>>>>> Reader logic seems higher risk. There's incomplete code in JERI to
>>>>>> implement a ping packet (see Mux.asyncSendPing, never used), but
>>> that
>>>>>> would only be relevant after the initial handshake and wouldn't help
>>>>>> here.
>>>>>>
>>>>>> Thanks,
>>>>>> Chris
>>>>>>
>>>>
>>>> Michael McGrady
>>>> Chief Systems Architect
>>>> Topia Technology, Inc.
>>>> Cel 1.253.720.3365
>>>> Work 1.253.572.9712 extension 2037
>>>> Skype ID: michael.mcgrady5
>>>> [email protected]
>>>>
>>>>
>>>>
>>>>
>>>
>>

Reply via email to