[
https://issues.apache.org/jira/browse/FLUME-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170500#comment-13170500
]
[email protected] commented on FLUME-883:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3214/
-----------------------------------------------------------
Review request for Eric Sammer.
Summary
-------
The E2E collector sink saves the batch tags as the batches are passed to the
downstream sinks. The ACKs are flushed when the roller close the file.
Currently for the HDFS sink, the close is the only operation that guarantees
that data is safely stored. Hence the acks are sent on close. If for some
reason, the writes fail then we don't send the acks assuming the data is lost.
The E2E mechanism then resends the data.
The problem is that if the close fails then we don't clear the accumulated acks
for that current rolltag. Hence its possible that the next successful roll
could send those acks and hence the batch will not be resent.
The fix is to clear the unsent acks when there's an IOException in close. Also
added a config property to disable the behavior for sinks where different close
semantics apply.
This addresses bug FLUME-883.
https://issues.apache.org/jira/browse/FLUME-883
Diffs
-----
flume-core/src/main/java/com/cloudera/flume/collector/CollectorSink.java
20f60c6
flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java
aeceb15
flume-core/src/test/java/com/cloudera/flume/collector/TestCollectorSink.java
e735f38
Diff: https://reviews.apache.org/r/3214/diff
Testing
-------
Added new test case.
Ran CollectorSink tests, will run rest of the regression tests.
Thanks,
Prasad
> Flume E2E sink could send incorrect ACKs if there are HDFS file close errors
> -----------------------------------------------------------------------------
>
> Key: FLUME-883
> URL: https://issues.apache.org/jira/browse/FLUME-883
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v0.9.4
> Reporter: Prasad Mujumdar
> Assignee: Prasad Mujumdar
> Fix For: v0.9.5
>
>
> The E2E collector sink saves the batch tags as the batches are passed to the
> downstream sinks. The ACKs are flushed when the roller close the file.
> Currently for the HDFS sink, the close is the only operation that guarantees
> that data is safely stored. Hence the acks are sent on close. If for some
> reason, the writes fail then we don't send the acks assuming the data is
> lost. The E2E mechanism then resends the data.
> The problem is that if the close fails then we don't clear the accumulated
> acks for that current rolltag. Hence its possible that the next successful
> roll could send those acks and hence the batch will not be resent.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira