[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810543#comment-13810543
 ] 

churro morales commented on HBASE-9865:
---------------------------------------

Hi Lars.

I believe the batching logic in shipping replication logs is incorrect (if we 
delete  a lot kv's from the HLogEntry and the number of key values for each 
entry is sufficiently large).  In this case, the logic which determines 
batchSize will underestimate the actual heap size required.  We will keep 
everything in the heap until we ship the edits, after that we can overwrite the 
values in the array for new HLogEntries and then garbage collection will take 
care of them.  

I think the problem is that when we remove quite a kvs from the HLogEntry 
(ReplicationSource.removeNonReplicableEdits()), we have to account for memory 
footprint for the ArrayList of kv's in the heap.



> WALEdit.heapSize() is incorrect in certain replication scenarios which may 
> cause RegionServers to go OOM
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9865
>                 URL: https://issues.apache.org/jira/browse/HBASE-9865
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.5, 0.95.0
>            Reporter: churro morales
>         Attachments: 9865-sample.txt
>
>
> WALEdit.heapSize() is incorrect in certain replication scenarios which may 
> cause RegionServers to go OOM.
> A little background on this issue.  We noticed that our source replication 
> regionservers would get into gc storms and sometimes even OOM. 
> We noticed a case where it showed that there were around 25k WALEdits to 
> replicate, each one with an ArrayList of KeyValues.  The array list had a 
> capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
> entries.
> When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
> WALEdit it removes all kv's that are scoped other than local.  
> But in doing so we don't account for the capacity of the ArrayList when 
> determining heapSize for a WALEdit.  The logic for shipping a batch is 
> whether you have hit a size capacity or number of entries capacity.  
> Therefore if have a WALEdit with 25k entries and suppose all are removed: 
> The size of the arrayList is 0 (we don't even count the collection's heap 
> size currently) but the capacity is ignored.
> This will yield a heapSize() of 0 bytes while in the best case it would be at 
> least 100000 bytes (provided you pass initialCapacity and you have 32 bit 
> JVM) 
> I have some ideas on how to address this problem and want to know everyone's 
> thoughts:
> 1. We use a probabalistic counter such as HyperLogLog and create something 
> like:
>       * class CapacityEstimateArrayList implements ArrayList
>               ** this class overrides all additive methods to update the 
> probabalistic counts
>               ** it includes one additional method called estimateCapacity 
> (we would take estimateCapacity - size() and fill in sizes for all references)
>       * Then we can do something like this in WALEdit.heapSize:
>       
> {code}
>   public long heapSize() {
>     long ret = ClassSize.ARRAYLIST;
>     for (KeyValue kv : kvs) {
>       ret += kv.heapSize();
>     }
>     long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
>     ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
>     if (scopes != null) {
>       ret += ClassSize.TREEMAP;
>       ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
>       // TODO this isn't quite right, need help here
>     }
>     return ret;
>   }   
> {code}
> 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
> array originally, and we provide some percentage threshold.  When that 
> threshold is met (50% of the entries have been removed) we can call 
> kvs.trimToSize()
> 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
> shoot me for this) to grab the actual capacity of the list.  Doing something 
> like this:
> {code}
> public int getArrayListCapacity()  {
>     try {
>       Field f = ArrayList.class.getDeclaredField("elementData");
>       f.setAccessible(true);
>       return ((Object[]) f.get(kvs)).length;
>     } catch (Exception e) {
>       log.warn("Exception in trying to get capacity on ArrayList", e);
>       return kvs.size();
>     }
> {code}
> I am partial to (1) using HyperLogLog and creating a 
> CapacityEstimateArrayList, this is reusable throughout the code for other 
> classes that implement HeapSize which contains ArrayLists.  The memory 
> footprint is very small and it is very fast.  The issue is that this is an 
> estimate, although we can configure the precision we most likely always be 
> conservative.  The estimateCapacity will always be less than the 
> actualCapacity, but it will be close. I think that putting the logic in 
> removeNonReplicableEdits will work, but this only solves the heapSize problem 
> in this particular scenario.  Solution 3 is slow and horrible but that gives 
> us the exact answer.
> I would love to hear if anyone else has any other ideas on how to remedy this 
> problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
> patch if the community thinks any of these approaches is a viable one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to