METRON-227 “Add Time-Based Flushing to Writer Bolt”, and
METRON-322 “Global Batching & flushing”
have been dormant since July, but contain some very valuable ideas. The basic
idea is that Metron’s Writer queues in general will flush on queue size, but
not on time. As a result, low-traffic or bursty channels can languish
unprocessed, and therefore un-ack’ed, which results in Storm automatically
recycling the messages after a certain timeout (topology.message.timeout.secs),
or if too many total pending messages accumulate in a topology
(topology.max.spout.pending). This results in duplicate messages and wasted
computations, as well as unpredictable latency.
Storm now has a very nice, low-complexity solution for time-based flushing,
using Tick Tuples.
I propose to use Tick Tuples to implement time-based flushing for all Writer
queues that currently flush only on queue size.
I will do this work in the context of METRON-322, subsuming METRON-227 into it.
Per the recommendation of some members of the Storm implementation team, I will
default the queue flush timeout (topology.tick.tuple.freq.secs) in each Writer
to half the value of topology.message.timeout.secs (minus delta). The default
value of topology.message.timeout.secs is 30 seconds, so in many cases the
queue flush times will be set to 14 seconds; but this will be configurable.
The reporter of METRON-322 was also concerned about “global” behavior of a
topology, for instance the Enhancer topology with multiple telemetry-specific
bolts in parallel. If each individual bolt accumulates a number of un-ack’ed
messages, the total across the whole topology can become large, and if
topology.max.spout.pending is set, it may trigger. However, the probability of
this drops greatly if we implement a reasonable default for queue flush
timeouts, and any remaining issue can be addressed by setting the bolt queue
size limits, and the value of topology.max.spout.pending itself, appropriately.
Therefore, I will not at this time worry much about this “global” behavior,
other than making sure that all Writers in the topology have queue flush
Your thoughts, suggestions, and concerns are invited.