ASF GitHub Bot commented on METRON-227:

Github user mattf-horton commented on the issue:

    -1.  Upon closer review, I find that there is no timer thread or similar 
mechanism implemented in this patch.  Thus, the patch only works if there is a 
steady trickle of new tuples to trigger the time comparison.  But if a burst of 
tuples are enqueued, and then no more tuples happen for a long time, the queue 
will languish without flushing until another tuple triggers the time 
comparison, potentially much later than the desired timeout.
    @mmiklavc 's suggestion to use Tick Tuples instead is precisely the correct 
answer, as it avoids the considerable complexity of implementing a (robust) 
timer thread.
    Since this PR has been dormant since July, and the contributor has not 
responded to queries on the Jira, I will take over the ticket, unless a more 
senior member of the community wishes to do so.
    I recommend that this PR be closed without committing, and I will do the 
related work in the context of METRON-322, which I am also assigning to.  

> Add Time-Based Flushing to Writer Bolt
> --------------------------------------
>                 Key: METRON-227
>                 URL: https://issues.apache.org/jira/browse/METRON-227
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Domenic Puzio
>            Assignee: Matt Foley
> We need to change the BulkMessageWriterBolt and BulkWriterComponent to use 
> time-based flushing when writing data to Elasticsearch or Solr.
> Currently, we set a batch size, and the Writer waits for that number of 
> tuples to build up; however, Storm has a timeout value that prevents it from 
> waiting for too long. If the Writer does not get the batch size before the 
> timeout, then it recycles the tuples through the topology. In addition, Storm 
> only allows so many pending messages that have not been acked - if too many 
> messages are waiting for the bulk Writer, then it will recycle them through 
> the topology. This is not desired behavior and directly impacts the 
> performance of this Writer. We would like to be able to specify a unit of 
> time for which the topology would flush, writing the data it's currently 
> holding to Elasticsearch or Solr even if the batch size is not met.

This message was sent by Atlassian JIRA

Reply via email to