[ https://issues.apache.org/jira/browse/STORM-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649374#comment-14649374 ]
Aaron Dossett commented on STORM-960: ------------------------------------- Thank you! One other observation: this change makes something like STORM-938 very important. Since tuples aren't ack'd until flushed then tuples will get timed out unless the batch is regularly filled up within the timeout window. Adding a periodic flush that's less than the timeout setting will really improve performance. It was implementing this change internally that led to us to find a solution for STORM-938. I've made our fix available as PR to 938. > Hive-Bolt can lose tuples when flushing data > -------------------------------------------- > > Key: STORM-960 > URL: https://issues.apache.org/jira/browse/STORM-960 > Project: Apache Storm > Issue Type: Improvement > Components: external > Reporter: Aaron Dossett > Assignee: Aaron Dossett > Priority: Minor > > In HiveBolt's execute method tuples are ack'd as they are received. When a > batchsize of tuples has been received, the writers are flushed. However, if > the flush fails only the most recent tuple will be marked as failed. All > prior tuples will already have been ack'd. This creates a window for data > loss. -- This message was sent by Atlassian JIRA (v6.3.4#6332)