[ 
https://issues.apache.org/jira/browse/FLUME-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628643#comment-13628643
 ] 

Hari Shreedharan commented on FLUME-1968:
-----------------------------------------

Brock,

This is something I have been thinking of for some time - but if we move the 
seek offset information out into the checkpoint metadata, we don't have to 
update the log file metadata on checkpoint and don't need to keep track of 2 
offsets, because the checkpoint metadata has the relevant info.

I'd like to keep the offset info (or offset/id of the last sync marker), so we 
don't have to read the entire log file (even if we dont push data into the 
event queue), when we recover from a backup checkpoint. I have seen situations 
where there were several (in hundreds) of log files due to downtime on HDFS 
etc. 

If we write the sync markers every time we checkpoint, we could recover from 
the last sync marker (when starting from last checkpoint) or the one sync 
marker just before the last (if we start up from backup checkpoint). Even that 
solution seems fine to me.
                
> FileChannel new format while being backwards compatible
> -------------------------------------------------------
>
>                 Key: FLUME-1968
>                 URL: https://issues.apache.org/jira/browse/FLUME-1968
>             Project: Flume
>          Issue Type: Bug
>          Components: Channel, File Channel
>            Reporter: Brock Noland
>
> There are a couple issues with the current format:
> 1) We have to track the offset at checkpoint time and write the offset to a 
> special location so we can seek to that offset during replay. In FLUME-1516 
> we are tracking two offsets.
> 2) We have no way to detect partial writes FLUME-1967
> 3) We can only checksum the body of the event, not the entire record 
> FLUME-1485 and therefore cannot detect corruption outside an event body.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to