[ 
https://issues.apache.org/jira/browse/ACCUMULO-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065362#comment-16065362
 ] 

Ivan Bella commented on ACCUMULO-4667:
--------------------------------------

[~kturner] You are correct.  I believe that is what the count is used for in 
the map passed into the seek call.  I will used that to pre-filter the locality 
groups as is currently being done in the seek.

> LocalityGroupIterator very inefficient with large locality groups
> -----------------------------------------------------------------
>
>                 Key: ACCUMULO-4667
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4667
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>    Affects Versions: 1.6.6, 1.7.3, 1.8.1, 2.0.0
>            Reporter: Ivan Bella
>            Assignee: Ivan Bella
>             Fix For: 1.8.2, 2.0.0
>
>
> On one of our systems we tracked some scans that were taking an extremely 
> long time to complete (many hours).  As it turns out the scan was relatively 
> simple in that it was scanning a tablet for all keys that had a specific 
> column family.  Note that there was very little data that actually matched 
> this column familiy.  Upon tracing the code we found that it was spending a 
> large amount of time in the LocalityGroupIterator.  Stack traces continually 
> found the code to be at line 128 or 129 of the LocalityGroupIterator.  Those 
> line numbers are consistent from the 1.6 series all the way to 2.0.0 
> (master).  In this case the column family being searched for was included in 
> one of a dozen or so locality groups on that table, and the locality group 
> itself had 40 or so column families.  We see several things that can be done 
> here:
> 1) The code that checks the group column families against those being 
> searched for can quickly exit once if finds a match
> 2) The code that checks the group column families against those being 
> searched for can look at the relative size of those two groups an invert the 
> logic appropriately for a more efficient loop.
> 3) We could create a cached map of column families to locality groups 
> allowing us to avoid examining each locality group every time we seek.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to