Matthew Cooke
Thu, 06 Jan 2005 04:07:59 -0800
Setting a timeout on the RMI socket is done on construction using
something like this:
RMISocketFactory.setSocketFactory(new RMISocketFactory() {
public Socket createSocket(String host, int
port) throws IOException {
Socket socket = new Socket(host, port);
socket.setSoTimeout(timeoutMillis);
socket.setSoLinger(false, 0);
return socket;
}
public ServerSocket createServerSocket(int
port) throws IOException {
return new ServerSocket(port);
}
});
I haven't tested this solution, but I had a peak at the code is JCS and
it didn't look too hard. async, blocking threads with a timeout might be
a good idea, but as timing out the RMI thread at the socket level looks
simpler it might be worth putting that in in the interim.
Matt.
On Wed, 2005-01-05 at 13:12 -0800, Smuts, Aaron wrote:
> I could use doug Lea's Future Result and call timedGet(millis).
>
> http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/FutureResult.html
>
> This would add some overhead, but it would be safer.
>
> If a get timeout, I can assume that the server is down an through an error.
> Then the client will Zombie and start balking, just as it does when we
> shutdown the server and not the machine.
>
> I could start by putting it in just the RMI client, since we don't have the
> same problem elsewhere. I'll try something.
>
> I think we need a general threadpool configuration mechanism. . . .
>
> Aaron
>
> -----Original Message-----
> From: Hanson Char [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, January 05, 2005 12:57 PM
> To: Turbine JCS Users List
> Subject: Re: Master cache machine no longer reachable causes spurious threads?
>
> Just an idea: turn each get operation into an asyn operation (using a thread
> from a thread pool) with a optional timeout parameter (with say a default of
> 5 secs).
>
> If the get doesn't finish within the timeout period, just terminate the
> thread and return null.
>
> So RMI or not, it's guaranteed not to block under all circumstances.
> Probably something from Doug's concurrent (backport) library can be taken
> advantage of.
>
> H
>
>
> On Wed, 5 Jan 2005 09:52:38 -0800, Smuts, Aaron <[EMAIL PROTECTED]> wrote:
> > If you know of a solution, please send it to me.
> >
> > Thanks,
> >
> > Aaron
> >
> > -----Original Message-----
> > From: Matthew Cooke [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, January 05, 2005 2:12 AM
> > To: Turbine JCS Users List
> > Subject: Re: Master cache machine no longer reachable causes spurious
> > threads?
> >
> > Master-remote cache Sun JDK 1.4 on Redhat linux 7.3.
> > Client machines were Sun JDK 1.4 on linux(prod) and winXP(testing).
> >
> > The problem was reproduced by executing a "shutdown -h now" on the
> > mastercache machine without cleanly killing the master-remote cache running
> > on it first. Client machines then hang on get's for much longer than
> > 30seconds before throwing a noroutetohost.
> >
> > Currently we have no fix other than, other than Don't kill the master cache
> > machine suddenly and if the hardware dies panic. Someone was investigating
> > modifying the rmi settings but without success. I know it is possible by
> > modifying the jcs/rmi code as i see many other RMI users have had similar
> > issues (google) and a fix is documented, i can probably dig it up if useful.
> >
> > Matt.
> >
> > Smuts, Aaron wrote:
> > > I can't reproduce the issue. I can get 30 second pauses if I pull the
> > > network cable out, but not 15 minute locks. I'm running the remote
> > > server on a windows box and hitting it from a linux box. I can disrupt
> > > things sometimes if I pull the network cable out of the windows box
> > > running the server. If I just kill the server everything is fine. . . .
> > > I'm running jdk 1.4.2_04.
> > >
> > > What jdk and os are you using?
> > >
> > > Aaron
> > >
> > >
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Smuts, Aaron [mailto:[EMAIL PROTECTED]
> > > Sent: Tuesday, January 04, 2005 1:53 PM
> > > To: Turbine JCS Users List; [EMAIL PROTECTED]
> > > Subject: RE: Master cache machine no longer reachable causes spurious
> > > threads?
> > >
> > > The various RMI properties that can be set are listed here.
> > >
> > > http://java.sun.com/j2se/1.4.2/docs/guide/rmi/sunrmiproperties.html
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Smuts, Aaron [mailto:[EMAIL PROTECTED]
> > > Sent: Tuesday, January 04, 2005 1:40 PM
> > > To: [EMAIL PROTECTED]
> > > Cc: turbine-jcs-user@jakarta.apache.org
> > > Subject: RE: Master cache machine no longer reachable causes spurious
> > > threads?
> > >
> > > Remove, and put requests to the remote rmi server are done
> > > asynchronously; however, get's are synchronous.
> > >
> > > If a get locks up, then it could potentially block other put and remove
> > > requests locally. Are you seeing all requests block.
> > >
> > > Why is the situation different if the machine goes down, versus the rmi
> > > server not running? I haven't dug into the sun rmi code very far.
> > >
> > > What do you suggest?
> > >
> > > You could run in put only mode with remove on put set to false, if you
> > > frequently have machines shutting down thereby killing the remote server.
> > >
> > > Aaron
> > >
> > > -----Original Message-----
> > > From: Tim Cocks [mailto:[EMAIL PROTECTED]
> > > Sent: Tuesday, December 07, 2004 9:53 AM
> > > To: Smuts, Aaron
> > > Cc: turbine-jcs-user@jakarta.apache.org
> > > Subject: Re: Master cache machine no longer reachable causes spurious
> > > threads?
> > >
> > > Thanks for your time. We are using the remote server. We have found it
> > > is almost exactly 15 minutes between when the machine running the master
> > > cache shuts down and when the clients realise the remote cache is no
> > > longer accessible. During those 15 minutes, calls to JCS block.
> > > After the 15 minutes, the calls return.
> > >
> > > The problem appears to be an RMI one. The fact the delay is consistently
> > > ~15 minutes seems to imply the timeout is working correctly, but is set
> > > too high. We considered changing the RMI timeouts by overriding
> > > RMISocketFactory. Unfortunately this would require us to change the JCS
> > > source code, something we would like to avoid.
> > >
> > > Tim
> > >
> > > On Mon, 6 Dec 2004 13:45:59 -0800, Smuts, Aaron <[EMAIL PROTECTED]> wrote:
> > >
> > >>I'll need to look into this.
> > >>
> > >>You are using the remote server? The client reconnect must not be timing
> > >>out properly.
> > >>
> > >>Aaron
> > >>
> > >>
> > >>
> > >>
> > >>-----Original Message-----
> > >>From: Tim Cocks [mailto:[EMAIL PROTECTED]
> > >>Sent: Friday, December 03, 2004 2:39 AM
> > >>To: turbine-jcs-user@jakarta.apache.org
> > >>Subject: Master cache machine no longer reachable causes spurious threads?
> > >>
> > >>We use JCS outside of Turbine on about 20 machines connected to a JCS
> > >>master cache.
> > >>
> > >>On occasion we have had to kill the JCS master cache process and have
> > >>observed the client machines gracefully realise the master cache is no
> > >>longer available. They continue to work indefinitely, albeit without
> > >>access to the master cache.
> > >>
> > >>However, when the machine running the master cache goes down completely
> > >>the clients continue attempting to connect. In the process, they are
> > >>creating more and more blocking threads and the JVM eventually terminates.
> > >>
> > >>Is this a known problem? If so, are there any solutions?
> > >>
> > >>Thanks in advance for any help,
> > >>
> > >>Tim Cocks
> > >>
> > >>--------------------------------------------------------------------
> > >>-
> > >>To unsubscribe, e-mail:
> > >>[EMAIL PROTECTED]
> > >>For additional commands, e-mail:
> > >>[EMAIL PROTECTED]
> > >>
> > >>
> > >
> > >
> > > --------------------------------------------------------------------
> > > -
> > > To unsubscribe, e-mail:
> > > [EMAIL PROTECTED]
> > > For additional commands, e-mail:
> > > [EMAIL PROTECTED]
> > >
> > >
> > > --------------------------------------------------------------------
> > > -
> > > To unsubscribe, e-mail:
> > > [EMAIL PROTECTED]
> > > For additional commands, e-mail:
> > > [EMAIL PROTECTED]
> > >
> > >
> > > --------------------------------------------------------------------
> > > -
> > > To unsubscribe, e-mail:
> > > [EMAIL PROTECTED]
> > > For additional commands, e-mail:
> > > [EMAIL PROTECTED]
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > [EMAIL PROTECTED]
> > For additional commands, e-mail:
> > [EMAIL PROTECTED]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > [EMAIL PROTECTED]
> > For additional commands, e-mail:
> > [EMAIL PROTECTED]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
--
Matthew Cooke <[EMAIL PROTECTED]>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]