[
https://issues.apache.org/jira/browse/STORM-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649374#comment-14649374
]
Aaron Dossett commented on STORM-960:
-------------------------------------
Thank you! One other observation: this change makes something like STORM-938
very important. Since tuples aren't ack'd until flushed then tuples will get
timed out unless the batch is regularly filled up within the timeout window.
Adding a periodic flush that's less than the timeout setting will really
improve performance.
It was implementing this change internally that led to us to find a solution
for STORM-938. I've made our fix available as PR to 938.
> Hive-Bolt can lose tuples when flushing data
> --------------------------------------------
>
> Key: STORM-960
> URL: https://issues.apache.org/jira/browse/STORM-960
> Project: Apache Storm
> Issue Type: Improvement
> Components: external
> Reporter: Aaron Dossett
> Assignee: Aaron Dossett
> Priority: Minor
>
> In HiveBolt's execute method tuples are ack'd as they are received. When a
> batchsize of tuples has been received, the writers are flushed. However, if
> the flush fails only the most recent tuple will be marked as failed. All
> prior tuples will already have been ack'd. This creates a window for data
> loss.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)