[jira] [Commented] (HBASE-1938) Make in-memory table scanning faster

nkeywal (JIRA) Tue, 26 Jul 2011 09:21:36 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071196#comment-13071196
 ]


nkeywal commented on HBASE-1938:
--------------------------------

I have an improvement that could make a real difference.

In Hbase, there is an iterator called MapEntryIterator, that acts in reality as 
a ValueIterator
{noformat}static class MapEntryIterator implements Iterator<KeyValue>
    private final Iterator<Map.Entry<KeyValue, KeyValue>> iterator;

    public KeyValue next() {
      return this.iterator.next().getValue();
    }
{noformat} 

However, with the current implementation of the JDK, there is an important 
difference between an iterator on values and an iterator on entries. From 
java.util.concurrent we can see:


The ValueIterator is straighforward:
    {noformat}final class ValueIterator extends Iter<V> {
        public V next() {
            V v = nextValue;
            advance();
            return v;
        }
    }{noformat}

While there is some defensive programming taking place for the EntryIterator, 
with the creation of an immutable object. 
    {noformat}final class EntryIterator extends Iter<Map.Entry<K,V>> {
        public Map.Entry<K,V> next() {
            Node<K,V> n = next;
            V v = nextValue;
            advance();
            return new AbstractMap.SimpleImmutableEntry<K,V>(n.key, v);
        }
    }{noformat} 

As a consequence, there is at least one object creation for every line in the 
hbase scanner. This creation is actually useless as we throw away the object 
immediatly. So, during the test several GC occur. I modified the 
MapEntryIterator implementation to iterate on the values.

{noformat}static class MapEntryIterator implements Iterator<KeyValue> {
    private final Iterator<KeyValue> iterator;

    public KeyValue next() {
      return this.iterator.next();
    }{noformat}

The scan time is divided by 3 on the test. It can obviously be put to any 
arbitrary improvement ratio as it's driven by the GC execution, but it should 
be valuable in production as well.

I am currently running the unit tests, I will add the patch if the execution is 
ok.



> Make in-memory table scanning faster
> ------------------------------------
>
>                 Key: HBASE-1938
>                 URL: https://issues.apache.org/jira/browse/HBASE-1938
>             Project: HBase
>          Issue Type: Improvement
>          Components: performance
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: MemStoreScanPerformance.java, 
> MemStoreScanPerformance.java, caching-keylength-in-kv.patch, test.patch
>
>
> This issue is about profiling hbase to see if I can make hbase scans run 
> faster when all is up in memory.  Talking to some users, they are seeing 
> about 1/4 million rows a second.  It should be able to go faster than this 
> (Scanning an array of objects, they can do about 4-5x this).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1938) Make in-memory table scanning faster

Reply via email to