[
https://issues.apache.org/jira/browse/FLUME-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628643#comment-13628643
]
Hari Shreedharan commented on FLUME-1968:
-----------------------------------------
Brock,
This is something I have been thinking of for some time - but if we move the
seek offset information out into the checkpoint metadata, we don't have to
update the log file metadata on checkpoint and don't need to keep track of 2
offsets, because the checkpoint metadata has the relevant info.
I'd like to keep the offset info (or offset/id of the last sync marker), so we
don't have to read the entire log file (even if we dont push data into the
event queue), when we recover from a backup checkpoint. I have seen situations
where there were several (in hundreds) of log files due to downtime on HDFS
etc.
If we write the sync markers every time we checkpoint, we could recover from
the last sync marker (when starting from last checkpoint) or the one sync
marker just before the last (if we start up from backup checkpoint). Even that
solution seems fine to me.
> FileChannel new format while being backwards compatible
> -------------------------------------------------------
>
> Key: FLUME-1968
> URL: https://issues.apache.org/jira/browse/FLUME-1968
> Project: Flume
> Issue Type: Bug
> Components: Channel, File Channel
> Reporter: Brock Noland
>
> There are a couple issues with the current format:
> 1) We have to track the offset at checkpoint time and write the offset to a
> special location so we can seek to that offset during replay. In FLUME-1516
> we are tracking two offsets.
> 2) We have no way to detect partial writes FLUME-1967
> 3) We can only checksum the body of the event, not the entire record
> FLUME-1485 and therefore cannot detect corruption outside an event body.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira