[jira] [Commented] (ACCUMULO-1770) out of memory error on very long running tablet server

Josh Elser (JIRA) Fri, 11 Oct 2013 10:29:06 -0700

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792838#comment-13792838
 ]


Josh Elser commented on ACCUMULO-1770:
--------------------------------------

I re-ran my tiny, contrived test and definitely see excessive RSS usage. I 
haven't dug into it yet; I wanted to post these first.

{code:java}
      BatchWriter bw = c.createBatchWriter("foo", new BatchWriterConfig());
      for (int i = 0; i < 2500000; i++) {
        Mutation m = new Mutation(Integer.toString(i));
        for (int j = 0; j < 10; j++) {
          for (int k = 0; k < 10; k++) {
            m.put(Integer.toString(j), Integer.toString(k), "");
          }
        }
        bw.addMutation(m);
      }
      
      bw.close();
{code}

I took the initial cold memory usage. Started the above code, taking the usage 
around 150M entries ("During"). Then, I waited for minor compaction to finish 
("End Pre-MajC"). Finally, I issued a major compaction for the table ("End 
Post-MajC").

||Time||Virtual||Resident||
|Start|26535192|550236|
|During|41551148|15791996|
|End Pre-MajC|42466608|16690456|
|End Post-MajC|40567092|14770068|

Virtual and Resident are in KB. I think I only had one or two minor compactions 
with 16G given to the memory maps. I also grabbed the output of 'pmap -x' for 
each of the timings in the table above.

Perhaps the size of the value isn't the issue?

> out of memory error on very long running tablet server
> ------------------------------------------------------
>
>                 Key: ACCUMULO-1770
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1770
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>         Attachments: FragmentTest.java, memory-usage.png
>
>
> On a large cluster it was noticed that a few of the tablet servers had been 
> pushed into swap.  This didn't effect the performance of the server until it 
> ran out of memory, and the process was killed.  The gc reports in the debug 
> log showed the system had plenty of heap space for the JVM.  The number of 
> threads in the server were not excessive (dozens).  This cluster ingests some 
> large values (megabytes).  The tablet server had been up for a month prior to 
> running out of memory.  MALLOC_ARENA_MAX had already been set to 1.
> * Investigate the effect of fragmentation on memory usage for large value 
> inserts.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ACCUMULO-1770) out of memory error on very long running tablet server

Reply via email to