Hi Everyone, As a follow on to my e-mail last week introducing the core broker prototype that Hiram and I have been working on, I wanted to spin up a thread on the flow control model that we're using.
I'd be interested to hear in your thoughts on current shortcomings associated with flow control / memory management in 5.3 so we can make sure that the use cases are covered. Beyond that any additional input on the design or implementation would be great ... are we on the right track? Cheers, Colin The text below is taken straight from the webgen in the project, my apologies if it's a little verbose! As a reminder the bits can be found at: https://svn.apache.org/repos/asf/activemq/sandbox/activemq-flow The activemq-flow package is meant to be a standalone module that deals generically with Resources and Flow's of elements that flow through and between them. The current implementation is designed with the following goals in mind: * SIMPLE: Want a fairly simple and consistent model for controlling flow of messages and other data in the system to control memory and disk space. The module must be able to handle fan-in/fan-out as well as simpler 1 to 1 cases. * PERFORMANT: The flow control mechanism must be performant and should not introduce much overhead in cases where downstream resources are able to keep up. * MODULARIZED: The module should be independent generic and reusable. * FAIRNESS: We should be able to provide better fairness. If I've got several producers putting messages on a queue, the flow controller should not prefer one source over the other (unless configured to do so) * VISIBILITY: With a unified model in place we can instrument it to provide visibility in the product (e.g. a visual graph of flows in the system). When a customer says that they are not using PERSISTENT messages yet we see 1000msgs/sec flowing through the recovery log.... * ADMINISTRATION: We can explore the possibility of administratively limiting message flows. E.g. I've done my production stress testing and can successfully handle my anticipated load of 4000 msgs/sec on topic1 ... I'd prefer to avoid the case where publishers go berserk and overload my backend with messages). * POLICIES: We should be able to better instrument general flow control policies. E.g. I want to tune for latency or throughput. If a subscriber gets behind, I'd like the policy for messages on topic1 to be that I drop the oldest messages instead of initiating flow control. The Basics: Each resource creates a FlowController for each of it's Flows which is assigned a corresponding FlowLimiter. As elements (e.g. messages) pass from one resource to another they are passed through the downstream resource's FlowController which updates its Limiter. If propagation of an element from one resource to another causes the downstream limiter to become throttled the associated FlowController will block the source of the element. The flow module is used heavily by the rest of the core for memory and disk management. * Memory Management: Memory is managed based on the resources in play -- the usage is computed by summing of the space allocated to each of the resources' limiters. This strategy intentionally avoids a centralized memory limit which leads to complicated logic to track when a centralized limiter needs to be decremented and avoids contention between multiple resources/threads accessing the limiter and also reduces the potential for memory limiter related deadlocks. However, it should be noted that this approach doesn't preclude implementing centralized limiters in the future. * Flow Control: As messages propagate from one resource A to another B, then if A overflows B's limit, B will block A and A can't release it's limiter space until B unblocks it. This allowance for overflow into downstream resources is a key concept in flow control performance and ease of use. Provided that the upstream resource has already accounted for the message's memory it can freely overflow any downstream limiter providing it reserves space from elements that caused overflow. * Threading Model: Note that as a message propagates from A to B, that the general contract is that A won't release it's memory if B blocks it during the course of dispatch. This means that it is not safe to perform a thread handoff during dispatch between two resources since the thread dispatching A relies on the message making it to B (so that B can block it) prior to A completing dispatch. * Management/Visibility: Another intended use of the activemq-flow module is to assist in visibility e.g. provide an underlying map of resources that can be exposed via tooling to see the relationships between sources and sinks of messages and to find bottlenecks ... this aspect has been downplayed for now as we have been focusing more on the queueing/memory management model in the prototype, but eventually the flow package itself will provide a handy way of providing visibility in the system particularly in terms of finding performance bottlenecks. FlowResource (FlowSink and FlowSource): A container for FlowControllers providing some lifecycle related logic. The base resource class handles interaction/registration with the FlowManager (below). FlowManager: Registry for Flow's and FlowResources. The manager will provide some hooks into system visibility. As mentioned above this aspect has been downplayed somewhat for the present time. FlowController: Wraps a FlowLimiter and actually implements common basic block/resume logic between FlowControllers. FlowLimiter: Defines the limits enforced by a FlowController. Currently the package has size based limiter implementations, but eventually should also support other common limiter types such as rate based limiters. The limiter's are also extended at other points in the broker (for example implementing a protocol based WindowLimiter). It is also likely that we would want to introduce CompositeLimiters to combine various limiter types. Flow: The concept of a flow is not used very heavily right now. But a Flow defines the stream of elements that can be blocked. In general the prototype creates a single flow per resource, but in the future a source may break it's elements down into more granular flows on which downstream sinks may block it. One case where this is anticipated as being useful is in networks of brokers where-in it may be desirable to partition messages into more granular flows (e.g based on producer or destination) to avoid blocking the broker-broker connection uncessarily).