[
https://issues.apache.org/jira/browse/FLUME-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430601#comment-13430601
]
Brock Noland commented on FLUME-1432:
-------------------------------------
That is a difficult question. The answer is no there is no format changes, only
how the the "timestamp" in the format is assigned and how we replay the file.
In flume 1.2 there are log files/checkpoints we cannot replay. This is what the
patch addresses. This change does not allow us to replay those invalid
files/checkpoints. I have looked at the problem and I don't see how we can
address that, they are just invalid. The invalid files/checkpoints occur when
the channel is full.
I have merged this JIRA with FLUME-1431 because that JIRA added a regression
test framework for the file channel format. Additionally, I added an additional
test which specifically tests log files/checkpoints that were created with
flume-1.2. Review board has been updated.
Note: this patch includes binary content so once we get a +1 I will have to
commit it.
> FileChannel should replay logs in the order they were written
> -------------------------------------------------------------
>
> Key: FLUME-1432
> URL: https://issues.apache.org/jira/browse/FLUME-1432
> Project: Flume
> Issue Type: Bug
> Components: Channel
> Affects Versions: v1.2.0
> Reporter: Brock Noland
> Assignee: Brock Noland
>
> Currently we replay the logs one at a time causing us to build large queue of
> pending takes. Additionally, there maybe scenerios where this simply will not
> work. Take a queue which is full (via checkpoint) and two files:
> 1:
> put
> commit
> put
> commit
> 2:
> take
> commit
> take
> commit
> take
> commit
> Replaying these logs in the current form will not work because we will we try
> and reply the puts first and exceed our queue size. For these reasons, we
> should replay them in the order they were written.
> However, at present there is no way to do this. Currently we have two
> identifers in each record we write, a transaction id and a timestamp. Neither
> can be used in replaying logs in order because the transaction id is created
> when we create the transaction not when we write to the log. Someone could
> create transaction, sleep, and then do work. The timestamp its not granular
> enough as we could have duplicates.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira