[
https://issues.apache.org/jira/browse/HBASE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071196#comment-13071196
]
nkeywal commented on HBASE-1938:
--------------------------------
I have an improvement that could make a real difference.
In Hbase, there is an iterator called MapEntryIterator, that acts in reality as
a ValueIterator
{noformat}static class MapEntryIterator implements Iterator<KeyValue>
private final Iterator<Map.Entry<KeyValue, KeyValue>> iterator;
public KeyValue next() {
return this.iterator.next().getValue();
}
{noformat}
However, with the current implementation of the JDK, there is an important
difference between an iterator on values and an iterator on entries. From
java.util.concurrent we can see:
The ValueIterator is straighforward:
{noformat}final class ValueIterator extends Iter<V> {
public V next() {
V v = nextValue;
advance();
return v;
}
}{noformat}
While there is some defensive programming taking place for the EntryIterator,
with the creation of an immutable object.
{noformat}final class EntryIterator extends Iter<Map.Entry<K,V>> {
public Map.Entry<K,V> next() {
Node<K,V> n = next;
V v = nextValue;
advance();
return new AbstractMap.SimpleImmutableEntry<K,V>(n.key, v);
}
}{noformat}
As a consequence, there is at least one object creation for every line in the
hbase scanner. This creation is actually useless as we throw away the object
immediatly. So, during the test several GC occur. I modified the
MapEntryIterator implementation to iterate on the values.
{noformat}static class MapEntryIterator implements Iterator<KeyValue> {
private final Iterator<KeyValue> iterator;
public KeyValue next() {
return this.iterator.next();
}{noformat}
The scan time is divided by 3 on the test. It can obviously be put to any
arbitrary improvement ratio as it's driven by the GC execution, but it should
be valuable in production as well.
I am currently running the unit tests, I will add the patch if the execution is
ok.
> Make in-memory table scanning faster
> ------------------------------------
>
> Key: HBASE-1938
> URL: https://issues.apache.org/jira/browse/HBASE-1938
> Project: HBase
> Issue Type: Improvement
> Components: performance
> Reporter: stack
> Assignee: stack
> Priority: Blocker
> Attachments: MemStoreScanPerformance.java,
> MemStoreScanPerformance.java, caching-keylength-in-kv.patch, test.patch
>
>
> This issue is about profiling hbase to see if I can make hbase scans run
> faster when all is up in memory. Talking to some users, they are seeing
> about 1/4 million rows a second. It should be able to go faster than this
> (Scanning an array of objects, they can do about 4-5x this).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira