[
https://issues.apache.org/jira/browse/ACCUMULO-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757724#comment-13757724
]
William Slacum commented on ACCUMULO-1682:
------------------------------------------
The real issue is that you can't do a sorted merge join on unsorted data. Since
the document IDs are the last part of the index entry in the key structure,
they will be unsorted when you read across multiple terms. I believe Adam
attempted to resolve this by using an isolated scanner with the combination of
consuming the entire range into a sorted map and using that map as a leaf
node/term source.
There are a couple of strategies you can try for this, one of which is to have
a temporary store, like an HDFS backed sorted set or using a local KV store
like LevelDB (I was partial to HawtDB, that you can write the range data out to
and then read from. This is similar to the map idea but not constrained by
memory.
You could also do a composite index using a space filling curve and use a
predicate or multiple ranges to cull extraneous data.
> Iterator and example to support intersection of document-partitioned index
> terms by ranges with lower and upper bounds.
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: ACCUMULO-1682
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1682
> Project: Accumulo
> Issue Type: Improvement
> Reporter: Corey J. Nolet
> Priority: Minor
> Labels: proposal
>
> The current IntersectingIterator seeks to discrete terms that are encoded
> into the column families to find all column qualifiers that share all of the
> discrete column families of interest (with the additional ability to negate
> some of the column families). Looking at the current IntersectingIterator
> code, it should be possible to return all column qualifiers with a column
> family within a given range.
> An example of this is finding all terms where NAME=Joe and (AGE>=30 &&
> AGE<60) and STATE!=MD. If an example is provided, numerical types like the
> age could easily be encoded using the new Lexicoders.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira