[ 
https://issues.apache.org/jira/browse/FLUME-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628173#comment-13628173
 ] 

Mike Percy commented on FLUME-1968:
-----------------------------------

Hi Brock, Avro actually supports a sync marker. But I don't know how efficient 
it is to do a binary search using that mechanism. But it would still be some 
form of O(log n) jumps.

With the Avro API you seek to a point in the file and then there is an API to 
find the next record (or compressed block) after a sync marker. I suppose you 
could read metadata about that record and if it's less or greater than the ID 
you are looking for you do another seek and then seek forward. Only issue is 
you may have to read a bunch of records at each jump point to ensure the record 
you are looking for is not in the same block.

                
> FileChannel new format while being backwards compatible
> -------------------------------------------------------
>
>                 Key: FLUME-1968
>                 URL: https://issues.apache.org/jira/browse/FLUME-1968
>             Project: Flume
>          Issue Type: Bug
>          Components: Channel, File Channel
>            Reporter: Brock Noland
>
> There are a couple issues with the current format:
> 1) We have to track the offset at checkpoint time and write the offset to a 
> special location so we can seek to that offset during replay. In FLUME-1516 
> we are tracking two offsets.
> 2) We have no way to detect partial writes FLUME-1967
> 3) We can only checksum the body of the event, not the entire record 
> FLUME-1485 and therefore cannot detect corruption outside an event body.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to