Can this behavior be simulated to get a feel for expected improvements? On Jun 29, 2012 10:45 PM, "Keith Turner (JIRA)" <[email protected]> wrote:
> > [ > https://issues.apache.org/jira/browse/ACCUMULO-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404374#comment-13404374] > > Keith Turner commented on ACCUMULO-473: > --------------------------------------- > > One other thing I like about building a transient per block index is that > it cleanly seperates indexing from the relative key encoding. We can > improve the relative key encoding independant of indexing as long as the > index can still jump to an arbitrary point. > > > Support binary search within RFile blocks > > ----------------------------------------- > > > > Key: ACCUMULO-473 > > URL: https://issues.apache.org/jira/browse/ACCUMULO-473 > > Project: Accumulo > > Issue Type: Improvement > > Components: tserver > > Reporter: Keith Turner > > Assignee: Keith Turner > > Fix For: 1.5.0 > > > > > > RFiles store blocks of key values using relative encoding. By default > these blocks are small (100k). To find a key in an RFile the index is used > to find the block, then the block is scanned for the key. It would be nice > for iterators that do alot of seeking if a binary search were done instead > of a scan. > > Relative encoding is a form of compression that serializes a key > relative to the last key. For example if the row in a key is the same as > the previous key, then the row is not stored again. This works well with > sorted data. The current on disk format does not support random access > within a block, because to read entry N all previous entries must be read. > One option would be to deserialize the block into a form that supports > random access and then do binary search. However this will consume memory > and CPU. If the relative encoding format could be modified to support > random access, then binary search could be supported in a memory and CPU > efficient way. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators: > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > For more information on JIRA, see: http://www.atlassian.com/software/jira > > >
