On 11 Jun 2009, at 17:06, Colin MacNaughton wrote:

Hi Rob,

In terms of configurable maximums with respect to destinations:
Maximum memory allocation per PTP queue is supported, but for topics the limits are actually tied to the Subscriptions receiving the message. This is because these are the objects that actually map to the underlying cursored queues that hold the messages and do the paging/limiting. Does that make
sense?
Makes a lot of sense from an implementation point of view - but doesn't translate too well for users to understand.


In terms of overall disk/memory limits. The approach we're taking would not explicitly define a single overall limit (at least not initially) -- rather the total maximum is based on the resources you create. E.g. as a user you
need to know how many queues and subscriptions you create and plan for
memory/disk accordingly. This doesn't preclude trying to enforce global limits later, but in my opinion doing so complicates the implementation a fair amount, in terms of trying to intelligently balancing the available
space across the queues and also leads to additional contention on the
shared limiter -- and worse can lead to resource related deadlocks if we get
it wrong. We can still do things like limit the maximum number/size of
subscriptions/queues/connections etc.
Again - from an implementation point of view makes a lot of sense - but makes it difficult for users - who will have to pre-dimension the broker.


Colin

-----Original Message-----
From: Rob Davies [mailto:rajdav...@gmail.com]
Sent: Thursday, June 11, 2009 1:12 AM
To: dev@activemq.apache.org
Subject: Re: ActiveMQ 6.0 Broker Core Prototype -- Flow Control / Memory
Management

Hi Colin,

In 5.x flow control behaves as if its binary - off or on. When its off
- messages can be offlined (for non-persistent messages this means
being dumped to temporary storage) - but when its on - the producers
slow and stop.
Also - there can be cases when you get a temporary slow consumer (the
consuming app may be doing a big gc) - which means with flow control
off - messages get dumped to disk - and then the producers may never
slow down enough again for the consumer to catch up. Flow control is
difficult to implement for all cases - but we should allow for
configuration of the following:

* maximum overall broker memory
* maximum memory allocation per destination
* maximum storage allocation
* maximum storage allocation per destination
* maximum temporary storage allocation
* maximum temporary storage allocation per destination

when we start to hit a resource limit - we should aggressively gc
messages that have expired, then either offline (an flow control when
that limit is hit) or flow control.
It would be great to have a combined policy where we can block a
producer for a short time (seconds) then offline
For non-persistent messages - we still need a policy where we can
remove messages based on a selector (which would be in addition to
expiring messages).

cheers,

Rob

On 10 Jun 2009, at 17:29, Colin MacNaughton wrote:

Hi Everyone,

As a follow on to my e-mail last week introducing the core broker
prototype that Hiram and I have been working on, I wanted to spin up a
thread on the flow control model that we're using.

I'd be interested to hear in your thoughts on current shortcomings
associated with flow control / memory management in 5.3 so we can make
sure that the use cases are covered. Beyond that any additional
input on
the design or implementation would be great ... are we on the right
track?

Cheers,
Colin


The text below is taken straight from the webgen in the project, my
apologies if it's a little verbose!

As a reminder the bits can be found at:
https://svn.apache.org/repos/asf/activemq/sandbox/activemq-flow

The activemq-flow package is meant to be a standalone module that
deals
generically with Resources and Flow's of elements that flow through
and
between them. The current implementation is designed with the
following
goals in mind:

  * SIMPLE: Want a fairly simple and consistent model for controlling
flow of messages and other data in the system to control memory and
disk
space. The module must be able to handle fan-in/fan-out as well as
simpler 1 to 1 cases.
  * PERFORMANT: The flow control mechanism must be performant and
should not introduce much overhead in cases where downstream resources
are able to keep up.
  * MODULARIZED: The module should be independent generic and
reusable.
  * FAIRNESS: We should be able to provide better fairness. If I've
got several producers putting messages on a queue, the flow controller
should not prefer one source over the other (unless configured to do
so)
  * VISIBILITY: With a unified model in place we can instrument it to
provide visibility in the product (e.g. a visual graph of flows in the
system). When a customer says that they are not using PERSISTENT
messages yet we see 1000msgs/sec flowing through the recovery log....
  * ADMINISTRATION: We can explore the possibility of
administratively
limiting message flows. E.g. I've done my production stress testing
and
can successfully handle my anticipated load of 4000 msgs/sec on topic1
... I'd prefer to avoid the case where publishers go berserk and
overload my backend with messages).
  * POLICIES: We should be able to better instrument general flow
control policies. E.g. I want to tune for latency or throughput. If a
subscriber gets behind, I'd like the policy for messages on topic1
to be
that I drop the oldest messages instead of initiating flow control.

The Basics:

Each resource creates a FlowController for each of it's Flows which is assigned a corresponding FlowLimiter. As elements (e.g. messages) pass
from one resource to another they are passed through the downstream
resource's FlowController which updates its Limiter. If propagation of
an element from one resource to another causes the downstream
limiter to
become throttled the associated FlowController will block the source
of
the element. The flow module is used heavily by the rest of the core
for
memory and disk management.

  * Memory Management: Memory is managed based on the resources in
play -- the usage is computed by summing of the space allocated to
each
of the resources' limiters. This strategy intentionally avoids a
centralized memory limit which leads to complicated logic to track
when
a centralized limiter needs to be decremented and avoids contention
between multiple resources/threads accessing the limiter and also
reduces the potential for memory limiter related deadlocks. However,
it
should be noted that this approach doesn't preclude implementing
centralized limiters in the future.
  * Flow Control: As messages propagate from one resource A to
another
B, then if A overflows B's limit, B will block A and A can't release
it's limiter space until B unblocks it. This allowance for overflow
into
downstream resources is a key concept in flow control performance and
ease of use. Provided that the upstream resource has already accounted for the message's memory it can freely overflow any downstream limiter
providing it reserves space from elements that caused overflow.
  * Threading Model: Note that as a message propagates from A to B,
that the general contract is that A won't release it's memory if B
blocks it during the course of dispatch. This means that it is not
safe
to perform a thread handoff during dispatch between two resources
since
the thread dispatching A relies on the message making it to B (so
that B
can block it) prior to A completing dispatch.
  * Management/Visibility: Another intended use of the activemq-flow
module is to assist in visibility e.g. provide an underlying map of
resources that can be exposed via tooling to see the relationships
between sources and sinks of messages and to find bottlenecks ... this
aspect has been downplayed for now as we have been focusing more on
the
queueing/memory management model in the prototype, but eventually the
flow package itself will provide a handy way of providing visibility
in
the system particularly in terms of finding performance bottlenecks.

FlowResource (FlowSink and FlowSource): A container for
FlowControllers
providing some lifecycle related logic. The base resource class
handles
interaction/registration with the FlowManager (below).

FlowManager: Registry for Flow's and FlowResources. The manager will
provide some hooks into system visibility. As mentioned above this
aspect has been downplayed somewhat for the present time.

FlowController: Wraps a FlowLimiter and actually implements common
basic
block/resume logic between FlowControllers.

FlowLimiter: Defines the limits enforced by a FlowController.
Currently
the package has size based limiter implementations, but eventually
should also support other common limiter types such as rate based
limiters. The limiter's are also extended at other points in the
broker
(for example implementing a protocol based WindowLimiter). It is also
likely that we would want to introduce CompositeLimiters to combine
various limiter types.

Flow: The concept of a flow is not used very heavily right now. But a
Flow defines the stream of elements that can be blocked. In general
the
prototype creates a single flow per resource, but in the future a
source
may break it's elements down into more granular flows on which
downstream sinks may block it. One case where this is anticipated as
being useful is in networks of brokers where-in it may be desirable to
partition messages into more granular flows (e.g based on producer or
destination) to avoid blocking the broker-broker connection
uncessarily).





Reply via email to