[
https://issues.apache.org/jira/browse/FLUME-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628173#comment-13628173
]
Mike Percy commented on FLUME-1968:
-----------------------------------
Hi Brock, Avro actually supports a sync marker. But I don't know how efficient
it is to do a binary search using that mechanism. But it would still be some
form of O(log n) jumps.
With the Avro API you seek to a point in the file and then there is an API to
find the next record (or compressed block) after a sync marker. I suppose you
could read metadata about that record and if it's less or greater than the ID
you are looking for you do another seek and then seek forward. Only issue is
you may have to read a bunch of records at each jump point to ensure the record
you are looking for is not in the same block.
> FileChannel new format while being backwards compatible
> -------------------------------------------------------
>
> Key: FLUME-1968
> URL: https://issues.apache.org/jira/browse/FLUME-1968
> Project: Flume
> Issue Type: Bug
> Components: Channel, File Channel
> Reporter: Brock Noland
>
> There are a couple issues with the current format:
> 1) We have to track the offset at checkpoint time and write the offset to a
> special location so we can seek to that offset during replay. In FLUME-1516
> we are tracking two offsets.
> 2) We have no way to detect partial writes FLUME-1967
> 3) We can only checksum the body of the event, not the entire record
> FLUME-1485 and therefore cannot detect corruption outside an event body.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira