[
https://issues.apache.org/jira/browse/ACCUMULO-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669732#comment-13669732
]
Keith Turner commented on ACCUMULO-1471:
----------------------------------------
There is actually a reason you would not want to filter in SortedMapIterator.
If you want to read from multiple SortedMapIterator, then you will provide them
as inputs to a MultiIterator. Column fam filtering should be done after the
multi-iterator, like the following.
ColumnFamilySkippingIterator(MultiIterator(SortedMapIterator(map1),SortedMapIterator(map2)))
If every data source below the MultiIterator does column fam fitlering, then
its possible that multiple data sources could unnecessarily read and filter
alot of data for each seek. They could do this even though another data source
has visible key that sorts before the data they are filtering. This could lead
to O(N^2) seek performance.
The reason ColumnFamilySkippingIterator passes columns through is so that lower
level data sources like rfile can possibly optimize what locality groups are
read.
So one possible fix for this is via javadocs.
> SortedMapIterator.seek() doesn't respect columnFamilies
> -------------------------------------------------------
>
> Key: ACCUMULO-1471
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1471
> Project: Accumulo
> Issue Type: Bug
> Components: client
> Affects Versions: 1.4.3, 1.5.0
> Reporter: Michael Berman
> Assignee: Michael Berman
> Priority: Minor
> Fix For: 1.5.1, 1.6.0
>
>
> If you specify columnFamilies in a seek() on a SortedMapIterator, it will
> happily return results from other column families. The arguments are never
> even read.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira