[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

Duo Zhang (JIRA) Thu, 03 Nov 2016 23:34:55 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15635437#comment-15635437
 ]


Duo Zhang commented on HBASE-16890:
-----------------------------------

Update. I've set up a single node HDFS cluster and run WALPE on the same 
machine.

{code:title=core-site.xml}
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>
{code}

{code:title=hdfs-site.xml}
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>/home/zhangduo/hadoop-2.7.3/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>/home/zhangduo/hadoop-2.7.3/dn</value>
  </property>
  <property>
    <name>dfs.namenode.checkpoint.dir</name>
    <value>/home/zhangduo/hadoop-2.7.3/snn</value>
  </property>
  <property>
    <name>dfs.namenode.checkpoint.edits.dir</name>
    <value>/home/zhangduo/hadoop-2.7.3/snn</value>
  </property>
</configuration>
{code}

And for WALPE, the important configs are
{code:title=hbase-site.xml}
  <property>
    <name>hbase.regionserver.logroll.multiplier</name>
    <value>0.5f</value>
  </property>
  <property>
    <name>hbase.regionserver.logroll.period</name>
    <value>7200000</value>
  </property>
  <property>
    <name>hbase.regionserver.maxlogs</name>
    <value>10000</value>
  </property>
  <property>
    <name>hbase.regionserver.hlog.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>hbase.regionserver.hlog.tolerable.lowreplication</name>
    <value>1</value>
  </property>
  <property>
    <name>hbase.wal.provider</name>
    <value>filesystem</value>
  </property>
  <property>
    <name>hbase.regionserver.hlog.blocksize</name>
    <value>1073741824</value>
  </property>
  <property>
    <name>hbase.regionserver.wal.disruptor.event.count</name>
    <value>1024</value>
  </property>
{code}

Will change 'hbase.wal.provider' to 'asyncfs' when testing AsyncFSWAL.

The machine is 2 * E5-2620, 2.4G, 24 cores, 128GB memory. And the gc config for 
WALPE is '-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xmx10g 
-Xms10g -XX:+UseConcMarkSweepGC'.

The result is not changed... FSHLog is still the slowest, and the three 
AsyncFSWAL are almost the same
{noformat}
./bin/hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation -threads 100 
-iterations 25000 -qualifiers 25 -keySize 50 -valueSize 200 &>log

FSHLog Summary: threads=100, iterations=25000, syncInterval=0 took 120.654s 
20720.408ops/s
AsyncFSWAL Summary: threads=100, iterations=25000, syncInterval=0 took 86.379s 
28942.221ops/s
AsyncFSWAL-duo Summary: threads=100, iterations=25000, syncInterval=0 took 
86.635s 28856.697ops/s
AsyncFSWAL-ram Summary: threads=100, iterations=25000, syncInterval=0 took 
88.495s 28250.184ops/s
{noformat}

What's your configs [~stack] [~ram_krish]?  Do you guys use SSD or some other 
new hardwares? Thanks.

> Analyze the performance of AsyncWAL and fix the same
> ----------------------------------------------------
>
>                 Key: HBASE-16890
>                 URL: https://issues.apache.org/jira/browse/HBASE-16890
>             Project: HBase
>          Issue Type: Sub-task
>          Components: wal
>    Affects Versions: 2.0.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, async.svg, classic.svg, 
> contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

Reply via email to