[jira] [Commented] (METRON-322) Global Batching and Flushing

ASF GitHub Bot (JIRA) Tue, 21 Mar 2017 07:31:17 -0700

    [ 
https://issues.apache.org/jira/browse/METRON-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15934713#comment-15934713
 ]


ASF GitHub Bot commented on METRON-322:
---------------------------------------

Github user mattf-horton commented on the issue:

    https://github.com/apache/incubator-metron/pull/481
  
    @cestella , thanks for looking at this. The primary motivation for adding 
batchTimeout was to prevent tuple recycling due to 
"topology.message.timeout.secs".  Thus, correctly configuring the timeout 
period requires interacting with the Bolt.  You're correct that this means each 
Bolt that uses `BulkWriterComponent` must be modified to include the tick tuple 
processing, as noted in my opening comments:
    ```
    After this patch is reviewed and accepted, similar work needs to be done 
for the ParserWriter, and possibly other sub-components. That will be in a 
separate PR.
    ```
    I implemented the changes in `BulkWriterComponent` such that it would 
default to conservative behavior if the containing Bolt didn't configure it.
    
    I considered using a timer thread instead of tick tuples, but:
    1. This is precisely one of the use cases contemplated by the Storm team 
when they created Tick Tuples, as discussed in the [article 
here](https://hortonworks.com/blog/apache-storm-design-pattern-micro-batching/) 
cited in the jira for METRON-322.
    1. It isn't sufficient to just create a timer thread.  One must also 
monitor that thread, be able to restart it if it dies, make sure it doesn't do 
anything non-thread-safe, etc.  These add significant complexity to the code, 
and uncertainty in the case of the thread-safeness, since any pattern we create 
here will surely be imitated by other developments down the road, and Bolt code 
is not typically thread-safe.
    1. On the other hand, using the built-in Tick Tuples avoids both the 
complexity, since it handles the reliability issues internal to Storm, and 
uncertainty, since the Tick Tuple is processed in the single flow of control of 
normal Bolt processing.
    
    So I think it's cleaner to use the feature provided by the Storm 
environment.  I'm open to arguments to the contrary.


> Global Batching and Flushing
> ----------------------------
>
>                 Key: METRON-322
>                 URL: https://issues.apache.org/jira/browse/METRON-322
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Ajay Yadav
>            Assignee: Matt Foley
>
> All Writers and other bolts that maintain an internal "batch" queue, need to 
> have a timeout flush, to prevent messages from low-volume telemetries from 
> sitting in their queues indefinitely.  Storm has a timeout value 
> (topology.message.timeout.secs) that prevents it from waiting for too long. 
> If the Writer does not process the queue before the timeout, then Storm 
> recycles the tuples through the topology. This has multiple undesirable 
> consequences, including data duplication and waste of compute resources. We 
> would like to be able to specify an interval after which the queues would 
> flush, even if the batch size is not met.
> We will utilize the Storm Tick Tuple to trigger timeout flushing, following 
> the recommendations of the article at 
> http://hortonworks.com/blog/apache-storm-design-pattern-micro-batching/#CONCLUSION
> Since every Writer processes its queue somewhat differently, every bolt that 
> has a "batchSize" parameter will be given a "batchTimeout" parameter too.  It 
> will default to 1/2 the value of "topology.message.timeout.secs", as 
> recommended, and will ignore settings larger than the default, which could 
> cause failure to flush in time.  In the Enrichment topology, where two 
> Writers may be placed one after the other (enrichment and threat intel), the 
> default timeout interval will be 1/4 the value of 
> "topology.message.timeout.secs".  The default value of 
> "topology.message.timeout.secs" in Storm is 30 seconds.
> In addition, Storm provides a limit on the number of pending messages that 
> have not been acked. If more than "topology.max.spout.pending" messages are 
> waiting in a topology, then Storm will recycle them through the topology. 
> However, the default value of "topology.max.spout.pending" is null, and if 
> set to non-null value, the user can manage the consequences by setting 
> batchSize limits appropriately.  Having the timeout flush will also 
> ameliorate this issue.  So we do not need to address 
> "topology.max.spout.pending" directly in this task.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (METRON-322) Global Batching and Flushing

Reply via email to