[
https://issues.apache.org/jira/browse/FLUME-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488532#comment-13488532
]
Mike Percy commented on FLUME-1665:
-----------------------------------
Yes, Flume may create duplicates. But the goal is not to create any under
normal conditions... Definitely less duplicates is better. But correctness and
reliability are more important.
Example of slowness: Maybe you have a 50-megabyte data transfer transaction
over a slow network link, or you are operating a file channel on an overwhelmed
disk with a large batch of large events, or you hit a Hadoop GC when writing to
HDFS... in such cases, a multi-second delay is not difficult to achieve.
> Data from FileChannel will be duplicated when restarting configuration
> ----------------------------------------------------------------------
>
> Key: FLUME-1665
> URL: https://issues.apache.org/jira/browse/FLUME-1665
> Project: Flume
> Issue Type: Bug
> Components: Channel
> Affects Versions: v1.2.0, v1.3.0
> Reporter: Denny Ye
> Labels: FileChannel
>
> When Flume process was running, I changed configuration property and Flume
> rebooted without process restarting. Events will be duplicated in next loop,
> also has been consumed before all components have stopped.
> I found the root cause. When FileChannel was stopping, it should save the
> 'inflightPuts' and 'inflightTakes' into disk for preparing in next loop.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira