Hi,

marc fleury wrote:
> 
> So I see 2 usages of time out...
> 
> one is a "time-out" because someone went to lunch and we don't want to tie
> up resources on the server, passivation won't do since we are in a
> transaction.

I don't like this kind of timeout. If a timeout is active until
an operator has entered some data, the duration of the timeout
could sometimes be long. This may cause other users to complain
because they are locked out from some resource, and before the
IT staff can find out what is wrong the first user presses 
enter and the transaction is finished.
Could also be the other way around: If the user is too slow a
timeout should happen, as in the lunch example above. The user
may them call IT staff and complain: "I did as I always do, but
I got this timed out message."

IMHO, this kind of long transactions void scalability due to the
locking issues. It should be avoided by jBoss users, but we
should also be tolerant and allow it.

> the other one we see under test,  is that under very heavy load (we are
> doing the tests right now), like loads that puts a good lunix machine under
> x5 CPU load, the server time-outs the transactions after awhile.

During server overload timeouts are normal. If we had no timeouts,
or if the timeout is set to a very long time, users would call IT
staff and complain: "The server does not reply, nothing happens
when I try to use it."
But if timeouts are set to a sensible value, the user of an
overloaded server would get a "timed out" error before calling
IT staff.

> BTW sebastien found a fix for that "timeout bug" that I described before and
> it now works, the server doesn't lock at all.  it runs *very* well.

Great.

> So now under very heavy load it throws many exceptions (timeout exceptions)
> but it goes on very happily.

This is how it should be IMHO.
 
> My question is this
> 1- can't we put a time out at 1hour or so.  The reason is very simple.  I
> was always impressed by the "stability" of linux... ie. under very heavy
> load, it swaps and goes slow and what not but it carries on and finishes.
> If we put one hour, under the loads described (hey even the big SAP install
> I know of don't go up to 1000 concurent clients all on the SAME instance),
> then the server will surely take the time to answer (it will be slow) but at
> least IT WON"T THROW EXCEPTIONS on timeouts.

We could, but please consider the consequences: Every bug report
we get that would have been "timeout problem" will instead be
"server hang problem", as no users are going to wait an hour
to see if a timeout would happen. And this would not only be the
user having a long transaction: Other users may be locked out
because of a long transaction, and experience a "server hang
problem" too.

> 2- Ole, we have tried to change the time out time on your timeout factories,
> but with no success whatsoever... how do you do it?????

TimeoutFactory only works with absolute time in milliseconds
since epoch.
Timeout value is set in TxManager. With the code Oleg added
you have a MBean property transactionTimeout that you should
be able to set in the configuration files (I'm not sure how).

> 3-  In case we go with 1, then the "load" should be done in a
> "MetricsInterceptor" that can provide some feedback on the time it takes to
> complete a call, the number of beans in the container, the number of threads
> that are in etc etc... we can then provide an MBean that gives all that
> information. (time in-time out etc etc)... hey the famous group 77 s'got to
> be good for something...

A MetricsInterceptor could be a good idea even if we do not
use huge timeout values as defaults.

> open to opinions...
> 
> stability?

Stability is the most important requirement I have for jBoss.
But this is real stability under normal loads. For an
overloaded server (which is a problem by itself) I only
require correct operation.

> metrics?

Metrics are important. These give real world numbers
that are very useful for tuning code and configuration.
And they can also be useful when diagnosing problems.

> exceptions?

Exceptions are IMHO not very important. We should throw the
right exceptions at the right time, but we should not program
by exception.

> my vote is clear: percieved stability is very important... and the fact is
> that the container is super-stable even under very heavy load... so why give
> a bad impression on timeout exceptions, people will see that it is slow and
> that is all the information they need, or eventually a "mail" from the new
> interceptor that says "buy some more hardware dude!" but not these nasty
> "timeout exceptions"....  I don't know, what do you think?

I think that the next j2ee RI will send a mail with a
"special offer" for a SUN server when it detects that
this old 386 can't stand the heat ;-)

Seriously, I have the impression that you would like to
set the timeout value at a length that is useful only for
a server that is experiencing heavy overload. But by doing
this we just try to cure a symptom (transaction timeouts)
of the real problem (server overload).

Rather than doing this we should try to go for the real
problem.

Curing and/or avoiding server overload is a problem
that most servers have to deal with. This is
complicated by the fact that the server has no control
over the clients and thus cannot control the rate of
requests coming from them. (Only exception is that
cooperating clients could back off when they get
timeout errors.)

But something can be done: Currently when the server
is overloaded and we get timeouts, processor time has
already been spent servicing the requests that timed
out, and this processor time is wasted leading to even
more overload.
A lot of server implementations get around this by
taking extra steps to ensure that the server does not
start working on more jobs than it can handle.
This is often done by some kind of incoming request
queue. When a request comes in and the server is
experiencing overload the request is enqueued. This way
(almost) no processor time is spent on the request.
When the server finds time to service the request, it
is taken out of the queue to be serviced. But if the
server cannot find time to service the request, the
request is either simply dropped, or an error is
returned to the client.
This approach is sometimes called "request policing",
as the server acts as a traffic controller for the
incoming requests.
Depending on how the input queue is implemented this
can also help with some types of DOS attacks.

Sorry for the length of this answer, but I am afraid
that setting the timeout value too high might cause
other, worse problems for us and other jBoss users
than a "timed out" exception.


Best Regards,

Ole Husgaard.

Reply via email to