[GitHub] metron issue #481: METRON-322 Global Batching and Flushing

mattf-horton Wed, 12 Jul 2017 14:20:03 -0700

Github user mattf-horton commented on the issue:

    https://github.com/apache/metron/pull/481
  
    Comment on testing:  There are so many permutations it only seemed 
reasonable to automate them in unit test, and so I did.  As part of code 
review, please provide your opinion on whether the provided unit tests are 
adequate, or what additional test cases should be added.
    
    Manual end-to-end testing, if you are so moved, consists of six scenarios 
for a given sensor queue:
    1. Under **heavy continuous load** the batchSize still controls flushing 
behavior, because the queue size always exceeds batchSize before queue age 
exceeds batchTimeout.
    2. Under **light continuous load**, where each queue continues to receive 
at least one message per second, and batchSize is large enough it is never 
exceeded, then the batchTimeout for each queue should control flushing behavior 
within +/- 1 sec, because each new message triggers a check of the queue age 
and potential timeout flush.
      - NOTE: If the configured batchTimeout is set to a large number, bigger 
than `1/2 topology.message.timeout.secs - 1` (which equals **14 sec** by 
default), then it will be replaced by an effective value equal to `1/2 
topology.message.timeout.secs - 1`.  Flushing will occur within +/- 1 sec of 
each _effective_ batchTimeout interval, rather than the _configured_ 
batchTimeout interval.
    3. Under **light intermittent load**, where less than batchSize messages 
queue up, and gaps between messages may exceed the timeout interval, then age 
checks and potential flush events may be triggered by _either_ incoming 
messages or TickTuple events, depending on the phase relationship between 
intermittent bursts of messages, and the TickTuple system tick.  The TickTuple 
interval is guaranteed to be < `1/2 topology.message.timeout.secs`, hence the 
default TickTuple interval is 14 seconds.  But if the smallest batchTimeout 
configured for any sensor queue on the Bolt is < the default TickTuple 
interval, then that smallest value becomes the actual TickTuple interval.  This 
produces three sub-cases, all of which guarantee a flush event before any 
message gets recycled due to aging past `topology.message.timeout.secs`:
      - If the queue's configured batchTimeout is the smallest (or only) such 
on this Bolt, and that number is smaller than the default TickTuple interval, 
then it _becomes_ the actual TickTuple interval.  The queue is guaranteed to 
flush between 1x and 2x this interval.
      - If the queue's configured batchTimeout is not the smallest such, but 
still is < the default TickTuple interval, then the queue is guaranteed to 
flush between its own `configured batchTimeout` and its `configured 
batchTimeout + actual TickTuple interval` (which is less than 2x its own 
`configured batchTimeout`).
      - If the queue's configured batchTimeout is > the default TickTuple 
interval (14 sec default), then its effective batchTimeout is set to the 
default TickTuple interval.  The queue is guaranteed to flush between this 
`effective batchTimeout` and its `effective batchTimeout + actual TickTuple 
interval`.
    
    The upshot is that:
    * "Configured batchTimeout" should be thought of as "minimum age before 
you'll allow a time-based flush" (capped by default TickTuple interval, aka 1/2 
`topology.message.timeout.secs`)
    * "Actual TickTuple interval" is the "maximum time between age checks".  It 
will be <= all the configured batchTimeouts for the various sensors on the Bolt.
    * When a flush actually happens may be up to "effective batchTimeout" + 
"actual TickTuple interval", depending on exactly when intermittent message 
events and periodic Tick events happen.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] metron issue #481: METRON-322 Global Batching and Flushing

Reply via email to