James, Thanks for taking time to discuss this issue. Please see below:
On 8/10/06, James Strachan <[EMAIL PROTECTED]> wrote:
On 8/10/06, Komandur <[EMAIL PROTECTED]> wrote: > > >> 1. can we use an 'elastic prefetch' buffer based on a sliding window (like > >> in TCP) - this reacts to client (mis)behavior > > >We could start with a prefetch of 1 and increase it over time for well > >behaving clients. However it doesn't fix the problem as a mis-behaving > >consumer could still hog at least one message - though this would > >reduce the imact from 1000 or so to 1. > > Note that the prefetch window needs to follow the standard tcp stuff > of multiplicative decrease during problem period & additive increase upon > positive ack (IMHO, > there isn't much to be gained in reinventing the TCP flow control wheel, > which has been > honed for over a decade.) The problem is - once a message has been sent to a consumer its too late - the consumer is now hogging it. This differs considerably with TCP - in TCP it doesn't affect other connections if you send a little too much data to a socket.
TCP takes the perspective of end-end - in a way we can think of it as a messaging layer spanning both the sender and the receiver. We can take a similar approach, the broker and the clientside Activemq subsystem can work together to achieve our flow control goals. The activemq subsystem on the consumer side, as long as it is not actually delivered, can always reclaim it from the prefetch buffer (when the window is shrunk). In effect, we have a 'proxy' flow control system on the consumer side which is in tune with the brokerside.
This helps in several ways: > > - Messages are dispatched as soon as possible, as slow consumer will > automatically have a smaller 'prefetch window'. In fact by decaying the > 'prefetch window' (like in the latest implementations > of TCP flow control), a new slow consumer's window automatically shrinks. Growing and shriking the prefetch windows based on the amount of time it takes to get acknowledgements back is certainly possible - though its a different discussion and is for different reasons as it purely tunes the prefetch size to their optimal level. This also assumes that you can actually grow and shrink them accurately. e.g. the prefetch buffer sizes may need to be large for performance reasons when some messages take a long time to process or when networks are slow. So adding automatically sized prefetch windows could result in windows being too small.
James, you have a valid concern above with respect to slow response. This is another of the instances where TCP flow control works effectively. It is always striving to send 'bandwidth * delay' amount of data outstanding, to keep the receiving from starving due to slow response (refer to the IETF RFC on long thing networks). Note that a consumer side proxy logic allows us to take advantage of asymmetry (the proxy is able to track the consumer activity, without the variance introduced by network) to suit our needs. However AMQ-850 is about a completely different problem to sizing the
prefetch buffer - its what to do about a badly behaving consumer. > - I am not sure I understand the 'one message hog' case. Start with a prefetch of 1. Give a consumer a message then if the consumer doesn't do anything with it - or locks up while processing it. then that message is now 'hogged' - no other consumer can get the message until the consumer is closed or the client killed. > Most of the > consumers are idempotent (there are many failure cases to count on 'once and > only once' delivery). So there is no harm in redelivering this one message > for which no ack has been received yet. That 1 message will not be delivered to anyone else - which is a real problem. There's the added effect on ordering too. > >> 2. When the broker detects a misbehaving client, reclaim the unAcked > >> messages for other active consumers (and make the window size 0 or 1 in > >> step > >> 1 above) > > >If a client/connection misbehaves (e.g. becomes inactive) then the > >connection is closed and all consumers are closed too causing all > >their unacked messages to be redelivered. > > This sounds good. However, please note that misbehavior is not necessarily a > binary state. > Sometimes an ACK could be delayed for many reasons (either transient > consumer (mis) behavior or other network related issues). It is in the gray > areas that the tcp flow control works really well. Agreed - which is why AMQ-850 is introduced to allow people to set an inactivity timer on specific consumers. It could just be 1 thread which is blocked on some lock - while the other threads and the rest of the connection is working fine. -- James ------- http://radio.weblogs.com/0112098/