James, Thanks for taking time to discuss this issue. Please see below:

On 8/10/06, James Strachan <[EMAIL PROTECTED]> wrote:

On 8/10/06, Komandur <[EMAIL PROTECTED]> wrote:
>
> >> 1. can we use an 'elastic prefetch' buffer based on a sliding window
(like
> >> in TCP)  - this reacts to client (mis)behavior
>
> >We could start with a prefetch of 1 and increase it over time for well
> >behaving clients. However it doesn't fix the problem as a mis-behaving
> >consumer could still hog at least one message - though this would
> >reduce the imact from 1000 or so to 1.
>
> Note that the prefetch window needs to follow the standard tcp stuff
> of multiplicative decrease during problem period  & additive increase
upon
> positive ack (IMHO,
> there isn't much to be gained in reinventing the TCP flow control wheel,
> which has been
> honed for over a decade.)

The problem is - once a message has been sent to a consumer its too
late - the consumer is now hogging it. This differs considerably with
TCP - in TCP it doesn't affect other connections if you send a little
too much data to a socket.


TCP takes the perspective of end-end - in a way we can  think of it as a
messaging layer
spanning both the sender and the receiver.

We can take a similar approach, the broker and the clientside Activemq
subsystem can
work together to achieve our flow control goals. The activemq subsystem on
the consumer side,
as long as it is not actually delivered, can always reclaim it from the
prefetch buffer (when the window is shrunk). In effect, we have a 'proxy'
flow control system on the consumer side which is in tune with the
brokerside.


This helps in several ways:
>
> - Messages are dispatched as soon as possible, as slow consumer will
> automatically have a smaller 'prefetch window'. In fact by decaying the
> 'prefetch window' (like in the latest implementations
> of TCP flow control), a new slow consumer's window automatically
shrinks.

Growing and shriking the prefetch windows based on the amount of time
it takes to get acknowledgements back is certainly possible - though
its a different discussion and is for different reasons as it purely
tunes the prefetch size to their optimal level. This also assumes that
you can actually grow and shrink them accurately. e.g. the prefetch
buffer sizes may need to be large for performance reasons when some
messages take a long time to process or when networks are slow. So
adding automatically sized prefetch windows could result in windows
being too small.



James, you have a valid concern above with respect to slow response. This
is another of the instances where TCP flow control works effectively. It is
always striving to send 'bandwidth * delay' amount of data outstanding, to
keep
the receiving from starving due to slow response (refer to the IETF RFC on
long thing networks). Note that a consumer side proxy logic allows us to
take advantage
of asymmetry (the proxy is able to track the consumer activity, without the
variance introduced by network) to suit our needs.


However AMQ-850 is about a completely different problem to sizing the
prefetch buffer - its what to do about a badly behaving consumer.


> - I am not sure I understand the  'one message hog' case.

Start with a prefetch of 1. Give a consumer a message then if the
consumer doesn't do anything with it - or locks up while processing
it. then that message is now 'hogged' - no other consumer can get the
message until the consumer is closed or the client killed.


> Most of the
> consumers are idempotent (there are many failure cases to count on 'once
and
> only once' delivery). So there is no harm in redelivering this one
message
> for which no ack has been received yet.

That 1 message will not be delivered to anyone else - which is a real
problem. There's the added effect on ordering too.


> >> 2. When the broker detects a misbehaving client, reclaim the unAcked
> >> messages for other active consumers (and make the window size 0 or 1
in
> >> step
> >> 1 above)
>
> >If a client/connection misbehaves (e.g. becomes inactive) then the
> >connection is closed and all consumers are closed too causing all
> >their unacked messages to be redelivered.
>
> This sounds good. However, please note that misbehavior is not
necessarily a
> binary state.
> Sometimes an ACK could be delayed for many reasons (either transient
> consumer (mis) behavior or other network related issues). It is in the
gray
> areas that the tcp flow control works really well.

Agreed - which is why AMQ-850 is introduced to allow people to set an
inactivity timer on specific consumers. It could just be 1 thread
which is blocked on some lock - while the other threads and the rest
of the connection is working fine.

--

James
-------
http://radio.weblogs.com/0112098/

Reply via email to