[jira] Commented: (HBASE-3323) OOME in master splitting logs

stack (JIRA) Sat, 11 Dec 2010 11:35:28 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970507#action_12970507
 ]


stack commented on HBASE-3323:
------------------------------

So, on restart, the split completed (this is ten servers whose logs were to be 
split; one split set was of 33 logs... all in a 1G Master heap).

The logging is profuse but I'm grand w/ that.

The below looks like a version of HBASE-2471... is that so?  If so, good stuff.

{code}
2010-12-11 18:35:43,243 INFO 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: This region's directory 
doesn't exist: 
hdfs://sv2borg180:10000/hbase/TestTable/1d4a17311a6bc9f7b34f121bf121f42b. It is 
very likely that it was already split so it's safe to discard those edits.
{code}

On the patch, why flip parameter order -- i.e. moving conf from end of list to 
start?  People want to know!

Whats this about?

{code}
+  public void internTableName(byte []tablename) {
+    assert Bytes.equals(tablename, this.tablename);
+    this.tablename = tablename;
+  }
{code}

We could call this method mutiple times?  And it'd be an error if tablename was 
different on an invocation?  The method is public w/o doc.  It has to be public 
because called from different package?

Sorry... still not done w/ review.  Back later.

> OOME in master splitting logs
> -----------------------------
>
>                 Key: HBASE-3323
>                 URL: https://issues.apache.org/jira/browse/HBASE-3323
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.90.0
>
>         Attachments: hbase-3323.txt, hbase-3323.txt, hbase-3323.txt, sizes.png
>
>
> In testing a RS failure under heavy increment workload I ran into an OOME 
> when the master was splitting the logs.
> In this test case, I have exactly 136 bytes per log entry in all the logs, 
> and the logs are all around 66-74MB). With a batch size of 3 logs, this means 
> the master is loading about 500K-600K edits per log file. Each edit ends up 
> creating 3 byte[] objects, the references for which are each 8 bytes of RAM, 
> so we have 160 (136+8*3) bytes per edit used by the byte[]. For each edit we 
> also allocate a bunch of other objects: one HLog$Entry, one WALEdit, one 
> ArrayList, one LinkedList$Entry, one HLogKey, and one KeyValue. Overall this 
> works out to 400 bytes of overhead per edit. So, with the default settings on 
> this fairly average workload, the 1.5M log entries takes about 770MB of RAM. 
> Since I had a few log files that were a bit larger (around 90MB) it exceeded 
> 1GB of RAM and I got an OOME.
> For one, the 400 bytes per edit overhead is pretty bad, and we could probably 
> be a lot more efficient. For two, we should actually account this rather than 
> simply having a configurable "batch size" in the master.
> I think this is a blocker because I'm running with fairly default configs 
> here and just killing one RS made the cluster fall over due to master OOME.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3323) OOME in master splitting logs

Reply via email to