[ 
https://issues.apache.org/jira/browse/HBASE-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136872#comment-15136872
 ] 

ramkrishna.s.vasudevan commented on HBASE-15205:
------------------------------------------------

Let me tell with the code here. Since replication is enabled by default. For 
every append we try to scope the WALEdits with the replication scope. So even 
if there is one CF and there is no replication at all enabled even then we try 
to iterate and find the scope associated with that CF for every append. 
{code}
      } else {
        family = CellUtil.cloneFamily(cell);
        // Unexpected, has a tendency to happen in unit tests
        assert htd.getFamily(family) != null;

        if (!scopes.containsKey(family)) {
          int scope = htd.getFamily(family).getScope();
          if (scope != REPLICATION_SCOPE_LOCAL) {
            scopes.put(family, scope);
          }
        }
{code}
This code ' int scope = htd.getFamily(family).getScope();' generates lot of 
garbage as we do some new String() operation. 
In case of Multi Cf case in this same piece of code we define a local map 
{code}
NavigableMap<byte[], Integer> scopes = new TreeMap<byte[], 
Integer>(Bytes.BYTES_COMPARATOR);
{code}
to which we copy all the CFs and their scopes which has NON default scope 
associated. So for every append we iterate thro all the cells, find the scope 
of each CF in the cell (if it is not already added to the 'scopes') map.  This 
map is then serialized in the 'pb'.
The above logic for multiCF makes sense because if among all the cF if only one 
is with GLOBAL scope then only that information is added to that WALKey.
So first thing that we can avoid is reduce the garbage created by doing this 
new String() by actually getting the scope once in the HRegion and use that in 
the append(). This avoids all the garbage created by new String() and the UTF8 
encoder for every append. 
But the other thing is that if we can just add all the non default CF and 
serialize it for every WAL key we can even avoid the local map getting created 
and the check that we perform on these maps etc. But at the cost of serializing 
more information per WAL. 


> Do not find the replication scope for every WAL#append()
> --------------------------------------------------------
>
>                 Key: HBASE-15205
>                 URL: https://issues.apache.org/jira/browse/HBASE-15205
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Minor
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15205.patch, ScopeWALEdits.jpg, 
> ScopeWALEdits_afterpatch.jpg
>
>
> After the byte[] and char[] the other top contributor for lot of GC (though 
> it is only 2.86%) is the UTF_8.newDecoder.
> This happens because for every WAL append we try to calculate the replication 
> scope associate with the families associated with the TableDescriptor. I 
> think per WAL append doing this is very costly and creates lot of garbage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to