[ 
https://issues.apache.org/jira/browse/METRON-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated METRON-322:
------------------------------
    Description: 
All Writers and other bolts that maintain an internal "batch" queue, need to 
have a timeout flush, to prevent messages from low-volume telemetries from 
sitting in their queues indefinitely.  Storm has a timeout value 
(topology.message.timeout.secs) that prevents it from waiting for too long. If 
the Writer does not process the queue before the timeout, then Storm recycles 
the tuples through the topology. This has multiple undesirable consequences, 
including data duplication and waste of compute resources. We would like to be 
able to specify an interval after which the queues would flush, even if the 
batch size is not met.

We will utilize the Storm Tick Tuple to trigger timeout flushing, following the 
recommendations of the article at 
http://hortonworks.com/blog/apache-storm-design-pattern-micro-batching/#CONCLUSION
Since every Writer processes its queue somewhat differently, every bolt that 
has a "batchSize" parameter will be given a "batchTimeout" parameter too.  It 
will default to 1/2 the value of "topology.message.timeout.secs", as 
recommended, and will ignore settings larger than the default, which could 
cause failure to flush in time.  In the Enrichment topology, where two Writers 
may be placed one after the other (enrichment and threat intel), the default 
timeout interval will be 1/4 the value of "topology.message.timeout.secs".  The 
default value of "topology.message.timeout.secs" in Storm is 30 seconds.

In addition, Storm provides a limit on the number of pending messages that have 
not been acked. If more than "topology.max.spout.pending" messages are waiting 
in a topology, then Storm will recycle them through the topology. However, the 
default value of "topology.max.spout.pending" is null, and if set to non-null 
value, the user can manage the consequences by setting batchSize limits 
appropriately.  Having the timeout flush will also ameliorate this issue.  So 
we do not need to address "topology.max.spout.pending" directly in this task.

  was:Flushing individual telemetries with disparate traffic are not only 
difficult to tune in single topology but also creates lot of failed message 
overhead as topology level configurations like “timeout, max.spout.pending” etc 
can’t be changed for every telemetry. Instead of batching individual 
telemetries in enrichment we should batch & flush them together.


> Global Batching & flushing
> --------------------------
>
>                 Key: METRON-322
>                 URL: https://issues.apache.org/jira/browse/METRON-322
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Ajay Yadav
>            Assignee: Matt Foley
>
> All Writers and other bolts that maintain an internal "batch" queue, need to 
> have a timeout flush, to prevent messages from low-volume telemetries from 
> sitting in their queues indefinitely.  Storm has a timeout value 
> (topology.message.timeout.secs) that prevents it from waiting for too long. 
> If the Writer does not process the queue before the timeout, then Storm 
> recycles the tuples through the topology. This has multiple undesirable 
> consequences, including data duplication and waste of compute resources. We 
> would like to be able to specify an interval after which the queues would 
> flush, even if the batch size is not met.
> We will utilize the Storm Tick Tuple to trigger timeout flushing, following 
> the recommendations of the article at 
> http://hortonworks.com/blog/apache-storm-design-pattern-micro-batching/#CONCLUSION
> Since every Writer processes its queue somewhat differently, every bolt that 
> has a "batchSize" parameter will be given a "batchTimeout" parameter too.  It 
> will default to 1/2 the value of "topology.message.timeout.secs", as 
> recommended, and will ignore settings larger than the default, which could 
> cause failure to flush in time.  In the Enrichment topology, where two 
> Writers may be placed one after the other (enrichment and threat intel), the 
> default timeout interval will be 1/4 the value of 
> "topology.message.timeout.secs".  The default value of 
> "topology.message.timeout.secs" in Storm is 30 seconds.
> In addition, Storm provides a limit on the number of pending messages that 
> have not been acked. If more than "topology.max.spout.pending" messages are 
> waiting in a topology, then Storm will recycle them through the topology. 
> However, the default value of "topology.max.spout.pending" is null, and if 
> set to non-null value, the user can manage the consequences by setting 
> batchSize limits appropriately.  Having the timeout flush will also 
> ameliorate this issue.  So we do not need to address 
> "topology.max.spout.pending" directly in this task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to