[
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15635437#comment-15635437
]
Duo Zhang commented on HBASE-16890:
-----------------------------------
Update. I've set up a single node HDFS cluster and run WALPE on the same
machine.
{code:title=core-site.xml}
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
{code}
{code:title=hdfs-site.xml}
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/zhangduo/hadoop-2.7.3/nn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/zhangduo/hadoop-2.7.3/dn</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/home/zhangduo/hadoop-2.7.3/snn</value>
</property>
<property>
<name>dfs.namenode.checkpoint.edits.dir</name>
<value>/home/zhangduo/hadoop-2.7.3/snn</value>
</property>
</configuration>
{code}
And for WALPE, the important configs are
{code:title=hbase-site.xml}
<property>
<name>hbase.regionserver.logroll.multiplier</name>
<value>0.5f</value>
</property>
<property>
<name>hbase.regionserver.logroll.period</name>
<value>7200000</value>
</property>
<property>
<name>hbase.regionserver.maxlogs</name>
<value>10000</value>
</property>
<property>
<name>hbase.regionserver.hlog.replication</name>
<value>1</value>
</property>
<property>
<name>hbase.regionserver.hlog.tolerable.lowreplication</name>
<value>1</value>
</property>
<property>
<name>hbase.wal.provider</name>
<value>filesystem</value>
</property>
<property>
<name>hbase.regionserver.hlog.blocksize</name>
<value>1073741824</value>
</property>
<property>
<name>hbase.regionserver.wal.disruptor.event.count</name>
<value>1024</value>
</property>
{code}
Will change 'hbase.wal.provider' to 'asyncfs' when testing AsyncFSWAL.
The machine is 2 * E5-2620, 2.4G, 24 cores, 128GB memory. And the gc config for
WALPE is '-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xmx10g
-Xms10g -XX:+UseConcMarkSweepGC'.
The result is not changed... FSHLog is still the slowest, and the three
AsyncFSWAL are almost the same
{noformat}
./bin/hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation -threads 100
-iterations 25000 -qualifiers 25 -keySize 50 -valueSize 200 &>log
FSHLog Summary: threads=100, iterations=25000, syncInterval=0 took 120.654s
20720.408ops/s
AsyncFSWAL Summary: threads=100, iterations=25000, syncInterval=0 took 86.379s
28942.221ops/s
AsyncFSWAL-duo Summary: threads=100, iterations=25000, syncInterval=0 took
86.635s 28856.697ops/s
AsyncFSWAL-ram Summary: threads=100, iterations=25000, syncInterval=0 took
88.495s 28250.184ops/s
{noformat}
What's your configs [~stack] [~ram_krish]? Do you guys use SSD or some other
new hardwares? Thanks.
> Analyze the performance of AsyncWAL and fix the same
> ----------------------------------------------------
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
> Issue Type: Sub-task
> Components: wal
> Affects Versions: 2.0.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch,
> AsyncWAL_disruptor_4.patch, HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch,
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch,
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, async.svg, classic.svg,
> contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)