[
https://issues.apache.org/jira/browse/ACCUMULO-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065362#comment-16065362
]
Ivan Bella commented on ACCUMULO-4667:
--------------------------------------
[~kturner] You are correct. I believe that is what the count is used for in
the map passed into the seek call. I will used that to pre-filter the locality
groups as is currently being done in the seek.
> LocalityGroupIterator very inefficient with large locality groups
> -----------------------------------------------------------------
>
> Key: ACCUMULO-4667
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4667
> Project: Accumulo
> Issue Type: Improvement
> Components: tserver
> Affects Versions: 1.6.6, 1.7.3, 1.8.1, 2.0.0
> Reporter: Ivan Bella
> Assignee: Ivan Bella
> Fix For: 1.8.2, 2.0.0
>
>
> On one of our systems we tracked some scans that were taking an extremely
> long time to complete (many hours). As it turns out the scan was relatively
> simple in that it was scanning a tablet for all keys that had a specific
> column family. Note that there was very little data that actually matched
> this column familiy. Upon tracing the code we found that it was spending a
> large amount of time in the LocalityGroupIterator. Stack traces continually
> found the code to be at line 128 or 129 of the LocalityGroupIterator. Those
> line numbers are consistent from the 1.6 series all the way to 2.0.0
> (master). In this case the column family being searched for was included in
> one of a dozen or so locality groups on that table, and the locality group
> itself had 40 or so column families. We see several things that can be done
> here:
> 1) The code that checks the group column families against those being
> searched for can quickly exit once if finds a match
> 2) The code that checks the group column families against those being
> searched for can look at the relative size of those two groups an invert the
> logic appropriately for a more efficient loop.
> 3) We could create a cached map of column families to locality groups
> allowing us to avoid examining each locality group every time we seek.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)