turbine-jcs-user  

Re: Master cache machine no longer reachable causes spurious threads?

Hanson Char
Wed, 05 Jan 2005 12:57:54 -0800

Just an idea: turn each get operation into an asyn operation (using a
thread from a thread pool) with a optional timeout parameter (with say
a default of 5 secs).

If the get doesn't finish within the timeout period, just terminate
the thread and return null.

So RMI or not, it's guaranteed not to block under all circumstances. 
Probably something from Doug's concurrent (backport) library can be
taken advantage of.

H


On Wed, 5 Jan 2005 09:52:38 -0800, Smuts, Aaron <[EMAIL PROTECTED]> wrote:
> If you know of a solution, please send it to me.
> 
> Thanks,
> 
> Aaron 
> 
> -----Original Message-----
> From: Matthew Cooke [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, January 05, 2005 2:12 AM
> To: Turbine JCS Users List
> Subject: Re: Master cache machine no longer reachable causes spurious threads?
> 
> Master-remote cache Sun JDK 1.4 on Redhat linux 7.3.
> Client machines were Sun JDK 1.4 on linux(prod) and winXP(testing).
> 
> The problem was reproduced by executing a "shutdown -h now" on the 
> mastercache machine without cleanly killing the master-remote cache running 
> on it first. Client machines then hang on get's for much longer than 
> 30seconds before throwing a noroutetohost.
> 
> Currently we have no fix other than, other than Don't kill the master cache 
> machine suddenly and if the hardware dies panic. Someone was investigating 
> modifying the rmi settings but without success. I know it is possible by 
> modifying the jcs/rmi code as i see many other RMI users have had similar 
> issues (google) and a fix is documented, i can probably dig it up if useful.
> 
> Matt.
> 
> Smuts, Aaron wrote:
> > I can't reproduce the issue.  I can get 30 second pauses if I pull the 
> > network cable out, but not 15 minute locks.  I'm running the remote server 
> > on a windows box and hitting it from a linux box.  I can disrupt things 
> > sometimes if I pull the network cable out of the windows box running the 
> > server.  If I just kill the server everything is fine.  . . .   I'm running 
> > jdk 1.4.2_04.
> >
> > What jdk and os are you using?
> >
> > Aaron
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Smuts, Aaron [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, January 04, 2005 1:53 PM
> > To: Turbine JCS Users List; [EMAIL PROTECTED]
> > Subject: RE: Master cache machine no longer reachable causes spurious 
> > threads?
> >
> > The various RMI properties that can be set are listed here.
> >
> > http://java.sun.com/j2se/1.4.2/docs/guide/rmi/sunrmiproperties.html
> >
> >
> >
> > -----Original Message-----
> > From: Smuts, Aaron [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, January 04, 2005 1:40 PM
> > To: [EMAIL PROTECTED]
> > Cc: turbine-jcs-user@jakarta.apache.org
> > Subject: RE: Master cache machine no longer reachable causes spurious 
> > threads?
> >
> > Remove, and put requests to the remote rmi server are done asynchronously; 
> > however, get's are synchronous.
> >
> > If a get locks up, then it could potentially block other put and remove 
> > requests locally.  Are you seeing all requests block.
> >
> > Why is the situation different if the machine goes down, versus the rmi 
> > server not running?  I haven't dug into the sun rmi code very far.
> >
> > What do you suggest?
> >
> > You could run in put only mode with remove on put set to false, if you 
> > frequently have machines shutting down thereby killing the remote server.
> >
> > Aaron
> >
> > -----Original Message-----
> > From: Tim Cocks [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, December 07, 2004 9:53 AM
> > To: Smuts, Aaron
> > Cc: turbine-jcs-user@jakarta.apache.org
> > Subject: Re: Master cache machine no longer reachable causes spurious 
> > threads?
> >
> > Thanks for your time.  We are using the remote server.  We have found it is 
> > almost exactly 15 minutes between when the machine running the master cache 
> > shuts down and when the clients realise the remote cache is no longer 
> > accessible.  During those 15 minutes, calls to JCS block.
> >  After the 15 minutes, the calls return.
> >
> > The problem appears to be an RMI one. The fact the delay is consistently 
> > ~15 minutes seems to imply the timeout is working correctly, but is set too 
> > high.  We considered changing the RMI timeouts by overriding 
> > RMISocketFactory. Unfortunately this would require us to change the JCS 
> > source code, something we would like to avoid.
> >
> > Tim
> >
> > On Mon, 6 Dec 2004 13:45:59 -0800, Smuts, Aaron <[EMAIL PROTECTED]> wrote:
> >
> >>I'll need to look into this.
> >>
> >>You are using the remote server?  The client reconnect must not be timing 
> >>out properly.
> >>
> >>Aaron
> >>
> >>
> >>
> >>
> >>-----Original Message-----
> >>From: Tim Cocks [mailto:[EMAIL PROTECTED]
> >>Sent: Friday, December 03, 2004 2:39 AM
> >>To: turbine-jcs-user@jakarta.apache.org
> >>Subject: Master cache machine no longer reachable causes spurious threads?
> >>
> >>We use JCS outside of Turbine on about 20 machines connected to a JCS 
> >>master cache.
> >>
> >>On occasion we have had to kill the JCS master cache process and have 
> >>observed the client machines gracefully realise the master cache is no 
> >>longer available.  They continue to work indefinitely, albeit without 
> >>access to the master cache.
> >>
> >>However, when the machine running the master cache goes down completely the 
> >>clients continue attempting to connect.  In the process, they are creating 
> >>more and more blocking threads and the JVM eventually terminates.
> >>
> >>Is this a known problem?  If so, are there any solutions?
> >>
> >>Thanks in advance for any help,
> >>
> >>Tim Cocks
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail:
> >>[EMAIL PROTECTED]
> >>For additional commands, e-mail:
> >>[EMAIL PROTECTED]
> >>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > [EMAIL PROTECTED]
> > For additional commands, e-mail:
> > [EMAIL PROTECTED]
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > [EMAIL PROTECTED]
> > For additional commands, e-mail:
> > [EMAIL PROTECTED]
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > [EMAIL PROTECTED]
> > For additional commands, e-mail:
> > [EMAIL PROTECTED]
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]