Yes I think a timeout is what will work the best.  I just want to make sure we 
can release any associated server resources at the same time so that this 
doesn't result in a resource leak in the server.

Gregg

Sent from my iPhone

On May 2, 2011, at 11:27 AM, "Christopher Dolan" <[email protected]> 
wrote:

> I strongly agree  with Gregg's comments about timeouts in general, but I
> think this might be a special case. In the Mux.start() data flow, the
> client sends an 8-byte handshake and expects the server to respond with
> a similar 8-byte handshake (or error) promptly. I'm seeing indefinite
> stalls in real-world cases, so I need to do something and a timeout is
> the only solution I can think of.
> 
> Chris
> 
> -----Original Message-----
> From: Gregg Wonderly [mailto:[email protected]] 
> Sent: Saturday, April 30, 2011 10:03 AM
> To: [email protected]
> Cc: Mike McGrady
> Subject: Re: client hang in com.sun.jini.jeri.internal.mux.Mux.start()
> 
> In the history of original Java threading model, and the NIO development
> to use 
> "select/poll" from your own thread, rather than registering call back
> methods 
> (via an interface) kept a lot of development from using a model where
> threading 
> was managed internally by the package or by the JVM.  As a result, we
> have 
> structures like today where notifications are less common.  In this code
> though, 
> I think the structure is internal enough that it's not necessary to
> really use 
> Future or some other mechanisms.
> 
> Timeouts are always a "hard way" to manage "loss of functionality"
> because you 
> really don't know when things are "not working", only that something is
> taking 
> longer than your timeout accounted for.   Timeouts can make it possible
> for more 
> pending work to pile up on the other end that might slow the results
> even more. 
>  E.g. you wait 30seconds and retry while the actual work on the other
> end is 
> taking 35 seconds to get to and thus the queue rate exceeds the dequeue
> rate and 
> things start piling up.
> 
> If you are going to use a timeout, we need to have some sort of
> indication from 
> both ends perspective that the attempt has been aborted, as early as
> possible. 
> For cases where I/O traffic is written/read, that usually means closing
> the 
> socket.  I'm not sure of the ramifications of doing that, since I
> haven't looked 
> too hard at this code.
> 
> Gregg Wonderly
> 
> On 4/29/2011 4:38 PM, Mike McGrady wrote:
>> Throwing my two cents in here just to state my opinion.  This is an
> effort that
>> could pay dividends if it were done with a view toward the future -
> absolutely
>> no pun intended.  I do not know the details of the code but if
> futures could be
>> useful here, they would be welcomed by myself.
>> 
>> MG
>> 
>> 
>> On Apr 29, 2011, at 1:33 PM, Christopher Dolan wrote:
>> 
>>> Thanks, Tom.
>>> 
>>> I don't really understand your Future suggestion.  Are you suggesting
>>> changing the async handshake to a Future? If so, that sounds like a
> very
>>> involved change, touching a lot of code in Mux and its subclasses.
>>> 
>>> setDown() changes the value of the muxDown boolean to true, so it's a
>>> valid way out of the loop.
>>> 
>>> Chris
>>> 
>>> -----Original Message-----
>>> From: Tom Hobbs [mailto:[email protected]]
>>> Sent: Friday, April 29, 2011 2:27 PM
>>> To: [email protected]
>>> Subject: Re: client hang in
> com.sun.jini.jeri.internal.mux.Mux.start()
>>> 
>>> The proposed code looks fine to me.  Two points leap out, more
>>> discussion
>>> points than anything else.
>>> 
>>> For some reason I've recently developed an aversion to writing my on
>>> timeout
>>> logic.  did you consider using something like a Future here or might
>>> that be
>>> serious overkill (it wouldn't surprise me if it was)?
>>> 
>>> Also is Setdown intended to break out of the while loop?  Because I
>>> can't
>>> see a way to escape it.  (I don't have the rest of the code in front
> of
>>> me)
>>> 
>>> Thanks for keep raising these issues - especially because you usually
>>> supply
>>> fixes for them!
>>> 
>>> Tom
>>> 
>>> On 29 Apr 2011 19:41, "Christopher Dolan"<[email protected]>
>>> wrote:
>>>> I've experienced occasional cases where clients get stuck in the
>>>> following block of code in Mux.start. Has anyone experienced this
>>>> problem? I have a proposed solution below. Has anyone thought about
> a
>>>> similar solution already?
>>>> 
>>>> -- Current code --
>>>> 1 asyncSendClientConnectionHeader();
>>>> 2 synchronized (muxLock) {
>>>> 3 while (!muxDown&&  !clientConnectionReady) {
>>>> 4 try {
>>>> 5 muxLock.wait(); // REMIND: timeout?
>>>> 6 } catch (InterruptedException e) {
>>>> 7 ...
>>>> 8 }
>>>> 9 }
>>>> 10 if (muxDown) {
>>>> 11 IOException ioe = new IOException(muxDownMessage);
>>>> 12 ioe.initCause(muxDownCause);
>>>> 13 throw ioe;
>>>> 14 }
>>>> 15 }
>>>> 
>>>> -- Explanation of the code --
>>>> This code handles the initial client-server handshake that starts a
>>> JERI
>>>> connection. In line 1, the client sends its 8-byte greeting to the
>>>> server. Then in the loop on lines 3-9, it waits for the server's
>>>> response. If the reader thread gets a satisfactory response from the
>>>> server, it sets clientConnectionReady=true and calls
>>>> muxLock.notifyAll(). In all other cases (aborted connection,
>>> mismatched
>>>> protocol version, etc) the reader invokes Mux.setDown() which sets
>>>> muxDown=true and calls muxLock.notifyAll(). In lines 10-14, it
> throws
>>> if
>>>> the handshake was a failure.
>>>> 
>>>> In my scenario (which uses simple TCP sockets, nothing fancy), the
>>>> invoker thread sits on line 5 indefinitely. My problem hard to
>>>> reproduce, so I haven't found out what the server is doing in this
>>> case.
>>>> I hope to figure that out eventually, but presently I'm interested
> in
>>>> the "REMIND: timeout?" comment.
>>>> 
>>>> -- Timeout solution --
>>>> It seems obvious to me that there should be a timeout here. There
> are
>>>> lots of imaginable cases where the client could get stuck here:
>>>> server-side deadlock, abrupt server crash, logic error in client Mux
>>>> code. You'd expect that the server would either respond with its
>>> 8-byte
>>>> handshake very quickly or never, so a modest timeout (like 15 or 30
>>>> seconds) should be good. If that timeout is triggered, I would
> expect
>>>> that the code above would call Mux.setDown() and throw an
> IOException.
>>>> That exception would either cause a retry or be thrown up to the
>>> invoker
>>>> as a RemoteException.
>>>> 
>>>> -- Proposed code (untested) --
>>>> 3 long now = System.currentTimeMillis();
>>>> 4 long endTime = now + timeoutMillis;
>>>> 5 while (!muxDown&&  !clientConnectionReady) {
>>>> 6 if (now>= endTime) {
>>>> 7 setDown("timeout waiting for server to respond
>>>> to handshake", null);
>>>> 8 } else {
>>>> 9 try {
>>>> 10 muxLock.wait(endTime - now);
>>>> 11 now = System.currentTimeMillis();
>>>> 12 } catch (InterruptedException e) {
>>>> 13 setDown("interrupt waiting for connection
>>>> header", e);
>>>> 14 }
>>>> 15 }
>>>> 16 }
>>>> 
>>>> This code assumes a configurable timeoutMillis parameter has been
> set
>>>> earlier.
>>>> 
>>>> I can't think of any alternative solutions. Putting the timeout in
> the
>>>> Reader logic seems higher risk. There's incomplete code in JERI to
>>>> implement a ping packet (see Mux.asyncSendPing, never used), but
> that
>>>> would only be relevant after the initial handshake and wouldn't help
>>>> here.
>>>> 
>>>> Thanks,
>>>> Chris
>>>> 
>> 
>> Michael McGrady
>> Chief Systems Architect
>> Topia Technology, Inc.
>> Cel 1.253.720.3365
>> Work 1.253.572.9712 extension 2037
>> Skype ID: michael.mcgrady5
>> [email protected]
>> 
>> 
>> 
>> 
> 

Reply via email to