[
https://issues.apache.org/jira/browse/METRON-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375713#comment-15375713
]
ASF GitHub Bot commented on METRON-227:
---------------------------------------
Github user cestella commented on a diff in the pull request:
https://github.com/apache/incubator-metron/pull/188#discussion_r70704522
--- Diff:
metron-platform/metron-common/src/main/java/org/apache/metron/common/writer/BulkWriterComponent.java
---
@@ -103,10 +115,21 @@ public void write( String sensorType
}
messageList.add(message);
- if (tupleList.size() < batchSize) {
- sensorTupleMap.put(sensorType, tupleList);
- sensorMessageMap.put(sensorType, messageList);
- } else {
+
if(configurations.getGlobalConfig()!=null&&configurations.getGlobalConfig().get(Constants.TIME_FLUSH_FLAG)!=null)
--- End diff --
I'd recommend saving off the global config in a member variable and reusing
that reference. It's possible the configuration will get updated between calls
to `configurations.getGlobalConfig()` in zookeeper and therefore you'll get two
separate configs.
> Add Time-Based Flushing to Writer Bolt
> --------------------------------------
>
> Key: METRON-227
> URL: https://issues.apache.org/jira/browse/METRON-227
> Project: Metron
> Issue Type: Bug
> Reporter: Domenic Puzio
> Assignee: Ajay Yadav
> Labels: 0.2.1BETA
> Fix For: 0.2.1BETA
>
>
> We need to change the BulkMessageWriterBolt and BulkWriterComponent to use
> time-based flushing when writing data to Elasticsearch or Solr.
> Currently, we set a batch size, and the Writer waits for that number of
> tuples to build up; however, Storm has a timeout value that prevents it from
> waiting for too long. If the Writer does not get the batch size before the
> timeout, then it recycles the tuples through the topology. In addition, Storm
> only allows so many pending messages that have not been acked - if too many
> messages are waiting for the bulk Writer, then it will recycle them through
> the topology. This is not desired behavior and directly impacts the
> performance of this Writer. We would like to be able to specify a unit of
> time for which the topology would flush, writing the data it's currently
> holding to Elasticsearch or Solr even if the batch size is not met.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)