Tim Bain created AMQ-5370:
-----------------------------
Summary: Allow brokers to discard messages within N milliseconds
of their expiration time
Key: AMQ-5370
URL: https://issues.apache.org/jira/browse/AMQ-5370
Project: ActiveMQ
Issue Type: New Feature
Components: Broker, Connector
Reporter: Tim Bain
I want the ability to specify a threshold on a network connection where a
message will not be delivered if it is within N milliseconds of its expiration
time. This will allow me to tell my ActiveMQ brokers that I expect latency to
exist on a given network connection, so the broker can avoid using bandwidth to
deliver messages that will be dead on arrival and instead use that bandwidth to
deliver messages that will be alive when they are received.
The setting should be on the networkConnector, to allow different values for
different broker-to-broker links, and should control whether to deliver
messages in both directions. Since it's on the networkConnector, that setting
can only be applied to broker-to-broker connections.
It might also be worth having a setting on the transportConnector that would
apply to all connections to that connector, to simplify configuration if all
networkConnectors into a given broker will have the same latency; that would
also allow the setting to be applied to the delivery of messages to non-broker
consumers (though there should probably be a flag for whether to apply it to
non-broker consumers as well). The setting on a networkConnector should
override the value on a transportConnector, since it is specific to a single
connection. But having a setting on the transportConnector is a lower priority
than having a setting on the networkConnector.
The default should be 0 (so all messages that haven't actually expired would be
forwarded, just as they are now), but if I know that my network path has a
certain latency, I should be able to configure the broker to not even try
delivering messages that I know aren't likely to make it to an end consumer, so
that messages that will can be sent instead.
It would be great to eventually determine this adaptively to allow the brokers
to react to changing network conditions and to make configuration simpler, but
for a first implementation, manual static configuration would be fine. That
longer-term implementation would probably need to account for the full
end-to-end RTT for messages from producers to consumers (because looking at
only the next network link wouldn't guarantee that the message wouldn't get
discarded at a second slow link later in the path), so I don't expect it to
happen anytime soon, maybe ever.
*** PROBLEM DESCRIPTION ***
When my producer on one side of a high-latency WAN sends faster than our meager
allocation of the WAN's bandwidth, I quickly see all messages fail to be
delivered to the end consumer.
These are the three critical elements of the problem, which all have to be
present for it to happen:
1. Messages have a TTL set (the same for all messages), so they'll eventually
expire.
2. Producers are sending messages faster (in aggregate) than our bandwidth
allocation on the WAN. This means we're guaranteed to not deliver some of the
messages to the end consumer.
3. There is a non-trivial amount of latency across the WAN.
As messages are sent, they begin queuing on the sender-side broker. As time
goes on, the messages that are still in the producer-side broker's message
store get closer and closer to expiring, until eventually the message at the
head of the message store is within the WAN's latency value (e.g. 100ms) of the
message's expiration time. The amount of time it takes for this to happen
depends on how long it takes messages to time out and on the difference between
the producer's send rate and the WAN's bandwidth, but it will eventually
happen. This message will be sent by the producer-side broker (because
although it's really close to expiring, it hasn't expired yet), but when it's
received by the consumer-side broker, an amount of time equal to the WAN
latency has passed, so it's expired and gets discarded by the consumer-side
broker instead of getting delivered to the consumer.
>From this point onwards, no messages will get delivered to the consumer. As
>the messages in the producer-side broker's message store get closer to and
>eventually reach their expiration times, each message at the head of the
>message store will either be within the WAN latency of its timeout or after
>its timeout. If the former, it will get sent across the WAN but discarded by
>the consumer-side broker; if the latter, it will get discarded by the
>producer-side broker and that broker will find the next message in the message
>store that isn't yet expired (but will be by the time it arrives) and send it
>instead. As a result, all messages from that point onward either expire on
>the producer-side broker or the consumer-side broker. Even though there are
>lots of messages in the producer-side broker's message store that could be
>delivered successfully, ActiveMQ instead sends the first message in the
>message store even though an outside observer knows it will just get thrown
>away.
There should be a way to have ActiveMQ prioritize messages that are expected to
reach an end consumer over ones that are expected to time out before they get
there, to minimize wasteful use of scarce resources such as network links.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)