I think maybe the right tradeoff is to firstly figure out root
cause/fix the problem and then later, as part of a general move
towards JDK 5 re-structure to make use of "Java 5 timeout stuff".

My basic concern would be we would otherwise run the risk of
introducing complications/more bugs whilst we try and sort the
original.

On 3 May 2011 09:12, Tom Hobbs <[email protected]> wrote:
> Dan raises a good point, which is probably of higher importance than the one
> I did.
>
> My only comment about the use of a timeout was questioning whether we could
> start to use some Java 5 timeout stuff.  You guys have been looking at that
> bit of the code more than me, so if the answer is "no" then I can accept
> that.
>
> On 3 May 2011 08:29, "Dan Creswell" <[email protected]> wrote:
>> Mmm, agreed though I think to do it requires that Chris get to the
>> root cause that is producing the need for a client timeout.
>>
>> On 2 May 2011 23:17, Gregg Wonderly <[email protected]> wrote:
>>> Yes I think a timeout is what will work the best.  I just want to make
> sure we can release any associated server resources at the same time so that
> this doesn't result in a resource leak in the server.
>>>
>>> Gregg
>>>
>>> Sent from my iPhone
>>>
>>> On May 2, 2011, at 11:27 AM, "Christopher Dolan" <
> [email protected]> wrote:
>>>
>>>> I strongly agree  with Gregg's comments about timeouts in general, but I
>>>> think this might be a special case. In the Mux.start() data flow, the
>>>> client sends an 8-byte handshake and expects the server to respond with
>>>> a similar 8-byte handshake (or error) promptly. I'm seeing indefinite
>>>> stalls in real-world cases, so I need to do something and a timeout is
>>>> the only solution I can think of.
>>>>
>>>> Chris
>>>>
>>>> -----Original Message-----
>>>> From: Gregg Wonderly [mailto:[email protected]]
>>>> Sent: Saturday, April 30, 2011 10:03 AM
>>>> To: [email protected]
>>>> Cc: Mike McGrady
>>>> Subject: Re: client hang in com.sun.jini.jeri.internal.mux.Mux.start()
>>>>
>>>> In the history of original Java threading model, and the NIO development
>>>> to use
>>>> "select/poll" from your own thread, rather than registering call back
>>>> methods
>>>> (via an interface) kept a lot of development from using a model where
>>>> threading
>>>> was managed internally by the package or by the JVM.  As a result, we
>>>> have
>>>> structures like today where notifications are less common.  In this code
>>>> though,
>>>> I think the structure is internal enough that it's not necessary to
>>>> really use
>>>> Future or some other mechanisms.
>>>>
>>>> Timeouts are always a "hard way" to manage "loss of functionality"
>>>> because you
>>>> really don't know when things are "not working", only that something is
>>>> taking
>>>> longer than your timeout accounted for.   Timeouts can make it possible
>>>> for more
>>>> pending work to pile up on the other end that might slow the results
>>>> even more.
>>>>  E.g. you wait 30seconds and retry while the actual work on the other
>>>> end is
>>>> taking 35 seconds to get to and thus the queue rate exceeds the dequeue
>>>> rate and
>>>> things start piling up.
>>>>
>>>> If you are going to use a timeout, we need to have some sort of
>>>> indication from
>>>> both ends perspective that the attempt has been aborted, as early as
>>>> possible.
>>>> For cases where I/O traffic is written/read, that usually means closing
>>>> the
>>>> socket.  I'm not sure of the ramifications of doing that, since I
>>>> haven't looked
>>>> too hard at this code.
>>>>
>>>> Gregg Wonderly
>>>>
>>>> On 4/29/2011 4:38 PM, Mike McGrady wrote:
>>>>> Throwing my two cents in here just to state my opinion.  This is an
>>>> effort that
>>>>> could pay dividends if it were done with a view toward the future -
>>>> absolutely
>>>>> no pun intended.  I do not know the details of the code but if
>>>> futures could be
>>>>> useful here, they would be welcomed by myself.
>>>>>
>>>>> MG
>>>>>
>>>>>
>>>>> On Apr 29, 2011, at 1:33 PM, Christopher Dolan wrote:
>>>>>
>>>>>> Thanks, Tom.
>>>>>>
>>>>>> I don't really understand your Future suggestion.  Are you suggesting
>>>>>> changing the async handshake to a Future? If so, that sounds like a
>>>> very
>>>>>> involved change, touching a lot of code in Mux and its subclasses.
>>>>>>
>>>>>> setDown() changes the value of the muxDown boolean to true, so it's a
>>>>>> valid way out of the loop.
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Tom Hobbs [mailto:[email protected]]
>>>>>> Sent: Friday, April 29, 2011 2:27 PM
>>>>>> To: [email protected]
>>>>>> Subject: Re: client hang in
>>>> com.sun.jini.jeri.internal.mux.Mux.start()
>>>>>>
>>>>>> The proposed code looks fine to me.  Two points leap out, more
>>>>>> discussion
>>>>>> points than anything else.
>>>>>>
>>>>>> For some reason I've recently developed an aversion to writing my on
>>>>>> timeout
>>>>>> logic.  did you consider using something like a Future here or might
>>>>>> that be
>>>>>> serious overkill (it wouldn't surprise me if it was)?
>>>>>>
>>>>>> Also is Setdown intended to break out of the while loop?  Because I
>>>>>> can't
>>>>>> see a way to escape it.  (I don't have the rest of the code in front
>>>> of
>>>>>> me)
>>>>>>
>>>>>> Thanks for keep raising these issues - especially because you usually
>>>>>> supply
>>>>>> fixes for them!
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>> On 29 Apr 2011 19:41, "Christopher Dolan"<[email protected]>
>>>>>> wrote:
>>>>>>> I've experienced occasional cases where clients get stuck in the
>>>>>>> following block of code in Mux.start. Has anyone experienced this
>>>>>>> problem? I have a proposed solution below. Has anyone thought about
>>>> a
>>>>>>> similar solution already?
>>>>>>>
>>>>>>> -- Current code --
>>>>>>> 1 asyncSendClientConnectionHeader();
>>>>>>> 2 synchronized (muxLock) {
>>>>>>> 3 while (!muxDown&&  !clientConnectionReady) {
>>>>>>> 4 try {
>>>>>>> 5 muxLock.wait(); // REMIND: timeout?
>>>>>>> 6 } catch (InterruptedException e) {
>>>>>>> 7 ...
>>>>>>> 8 }
>>>>>>> 9 }
>>>>>>> 10 if (muxDown) {
>>>>>>> 11 IOException ioe = new IOException(muxDownMessage);
>>>>>>> 12 ioe.initCause(muxDownCause);
>>>>>>> 13 throw ioe;
>>>>>>> 14 }
>>>>>>> 15 }
>>>>>>>
>>>>>>> -- Explanation of the code --
>>>>>>> This code handles the initial client-server handshake that starts a
>>>>>> JERI
>>>>>>> connection. In line 1, the client sends its 8-byte greeting to the
>>>>>>> server. Then in the loop on lines 3-9, it waits for the server's
>>>>>>> response. If the reader thread gets a satisfactory response from the
>>>>>>> server, it sets clientConnectionReady=true and calls
>>>>>>> muxLock.notifyAll(). In all other cases (aborted connection,
>>>>>> mismatched
>>>>>>> protocol version, etc) the reader invokes Mux.setDown() which sets
>>>>>>> muxDown=true and calls muxLock.notifyAll(). In lines 10-14, it
>>>> throws
>>>>>> if
>>>>>>> the handshake was a failure.
>>>>>>>
>>>>>>> In my scenario (which uses simple TCP sockets, nothing fancy), the
>>>>>>> invoker thread sits on line 5 indefinitely. My problem hard to
>>>>>>> reproduce, so I haven't found out what the server is doing in this
>>>>>> case.
>>>>>>> I hope to figure that out eventually, but presently I'm interested
>>>> in
>>>>>>> the "REMIND: timeout?" comment.
>>>>>>>
>>>>>>> -- Timeout solution --
>>>>>>> It seems obvious to me that there should be a timeout here. There
>>>> are
>>>>>>> lots of imaginable cases where the client could get stuck here:
>>>>>>> server-side deadlock, abrupt server crash, logic error in client Mux
>>>>>>> code. You'd expect that the server would either respond with its
>>>>>> 8-byte
>>>>>>> handshake very quickly or never, so a modest timeout (like 15 or 30
>>>>>>> seconds) should be good. If that timeout is triggered, I would
>>>> expect
>>>>>>> that the code above would call Mux.setDown() and throw an
>>>> IOException.
>>>>>>> That exception would either cause a retry or be thrown up to the
>>>>>> invoker
>>>>>>> as a RemoteException.
>>>>>>>
>>>>>>> -- Proposed code (untested) --
>>>>>>> 3 long now = System.currentTimeMillis();
>>>>>>> 4 long endTime = now + timeoutMillis;
>>>>>>> 5 while (!muxDown&&  !clientConnectionReady) {
>>>>>>> 6 if (now>= endTime) {
>>>>>>> 7 setDown("timeout waiting for server to respond
>>>>>>> to handshake", null);
>>>>>>> 8 } else {
>>>>>>> 9 try {
>>>>>>> 10 muxLock.wait(endTime - now);
>>>>>>> 11 now = System.currentTimeMillis();
>>>>>>> 12 } catch (InterruptedException e) {
>>>>>>> 13 setDown("interrupt waiting for connection
>>>>>>> header", e);
>>>>>>> 14 }
>>>>>>> 15 }
>>>>>>> 16 }
>>>>>>>
>>>>>>> This code assumes a configurable timeoutMillis parameter has been
>>>> set
>>>>>>> earlier.
>>>>>>>
>>>>>>> I can't think of any alternative solutions. Putting the timeout in
>>>> the
>>>>>>> Reader logic seems higher risk. There's incomplete code in JERI to
>>>>>>> implement a ping packet (see Mux.asyncSendPing, never used), but
>>>> that
>>>>>>> would only be relevant after the initial handshake and wouldn't help
>>>>>>> here.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Chris
>>>>>>>
>>>>>
>>>>> Michael McGrady
>>>>> Chief Systems Architect
>>>>> Topia Technology, Inc.
>>>>> Cel 1.253.720.3365
>>>>> Work 1.253.572.9712 extension 2037
>>>>> Skype ID: michael.mcgrady5
>>>>> [email protected]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Reply via email to