Ivan Bella created ACCUMULO-4667:
------------------------------------
Summary: LocalityGroupIterator very inefficient with large
locality groups
Key: ACCUMULO-4667
URL: https://issues.apache.org/jira/browse/ACCUMULO-4667
Project: Accumulo
Issue Type: Improvement
Components: tserver
Affects Versions: 1.8.1, 1.7.3, 1.6.6, 2.0.0
Reporter: Ivan Bella
Assignee: Ivan Bella
Fix For: 1.8.2, 2.0.0
On one of our systems we tracked some scans that were taking an extremely long
time to complete (many hours). As it turns out the scan was relatively simple
in that it was scanning a tablet for all keys that had a specific column
family. Note that there was very little data that actually matched this column
familiy. Upon tracing the code we found that it was spending a large amount of
time in the LocalityGroupIterator. Stack traces continually found the code to
be at list 128 or 129 of the LocalityGroupIterator. Those line numbers are
consistent from the 1.6 series all the way to 2.0.0 (master). In this case the
column family being searched for was included in one of a dozen or so locality
groups on that table, and the locality group itself had 40 or so column
families. We see several things that can be done here:
1) The code that checks the group column families against those being searched
for can quickly exit once if finds a match
2) The code that checks the group column families against those being searched
for can look at the relative size of those two groups an invert the logic
appropriately for a more efficient loop.
3) We could create a cached map of column families to locality groups allowing
us to avoid examining each locality group every time we seek.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)