[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only

stack (JIRA) Fri, 05 Feb 2016 15:29:10 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135250#comment-15135250
 ]


stack commented on HBASE-14790:
-------------------------------

I gave it a try on a little cluster. Not all the metrics are hooked up and it 
looks like the patched version did 1/10th less operations. That said, the patch 
looks to be 2x faster on the simple WALPE test with default args.  it completes 
in half the time with half the CS, half the branch-misses, with a better 
instruction per cycle count. Patch is looking promising. Let me know when you'd 
like a bit of review [~Apache9]

In both cases, did this against a cluster of nine nodes:

{code}
$ perf stat ./hbase/bin/hbase --config ~/conf_hbase 
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation
{code}


Here is the end of the log from the UNPATCHED run (master branch as of now):

{code}
...
-- Histograms ------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos
             count = 1000000
               min = 513398
               max = 2166155
              mean = 721937.32
            stddev = 97572.13
            median = 717876.00
              75% <= 775738.00
              95% <= 869750.00
              98% <= 912876.00
              99% <= 950262.00
            99.9% <= 1314831.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync
             count = 1000008
               min = 1
               max = 1
              mean = 1.00
            stddev = 0.00
            median = 1.00
              75% <= 1.00
              95% <= 1.00
              98% <= 1.00
              99% <= 1.00
            99.9% <= 1.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs
             count = 1000008
               min = 453139
               max = 1871512
              mean = 639040.49
            stddev = 88377.41
            median = 633963.00
              75% <= 689163.00
              95% <= 764942.00
              98% <= 805957.00
              99% <= 836714.00
            99.9% <= 1871512.00


     -- Meters 
----------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes
             count = 557000000
         mean rate = 763472.57 events/second
     1-minute rate = 763802.93 events/second
     5-minute rate = 699050.48 events/second
    15-minute rate = 424725.42 events/second
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs
             count = 1000008
         mean rate = 1370.69 events/second
     1-minute rate = 1371.30 events/second
     5-minute rate = 1255.04 events/second
    15-minute rate = 762.53 events/second


2016-02-05 15:08:25,746 INFO  [WALPerfEval.logRoller] regionserver.LogRoller: 
LogRoller exiting.

 Performance counter stats for './hbase/bin/hbase --config 
/home/stack/conf_hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation':

     322385.687852 task-clock                #    0.438 CPUs utilized
         8,158,772 context-switches          #    0.025 M/sec
           575,922 cpu-migrations            #    0.002 M/sec
         3,064,079 page-faults               #    0.010 M/sec
   474,704,238,311 cycles                    #    1.472 GHz
   <not supported> stalled-cycles-frontend
   <not supported> stalled-cycles-backend
   204,341,980,679 instructions              #    0.43  insns per cycle
    36,933,173,991 branches                  #  114.562 M/sec
       787,891,661 branch-misses             #    2.13% of all branches

     735.817239034 seconds time elapsed

{code}

Here is the end of the log from the PATCHED run:

{code}
...

-- Histograms ------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos
             count = 901306
               min = 156502
               max = 364739
              mean = 231465.79
            stddev = 25166.33
            median = 227923.00
              75% <= 246449.00
              95% <= 274560.00
              98% <= 288228.00
              99% <= 307981.00
            99.9% <= 364739.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync
             count = 0
               min = 0
               max = 0
              mean = 0.00
            stddev = 0.00
            median = 0.00
              75% <= 0.00
              95% <= 0.00
              98% <= 0.00
              99% <= 0.00
            99.9% <= 0.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs
             count = 0
               min = 0
               max = 0
              mean = 0.00
            stddev = 0.00
            median = 0.00
              75% <= 0.00
              95% <= 0.00
              98% <= 0.00
              99% <= 0.00
            99.9% <= 0.00

-- Meters ----------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes
             count = 502038025
         mean rate = 2283690.25 events/second
     1-minute rate = 2268881.93 events/second
     5-minute rate = 1180418.70 events/second
    15-minute rate = 486667.36 events/second
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs
             count = 0
         mean rate = 0.00 events/second
     1-minute rate = 0.00 events/second
     5-minute rate = 0.00 events/second
    15-minute rate = 0.00 events/second

 Performance counter stats for './hbase/bin/hbase --config 
/home/stack/conf_hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation':

     256123.078857 task-clock                #    1.030 CPUs utilized
         3,359,633 context-switches          #    0.013 M/sec
            65,083 cpu-migrations            #    0.254 K/sec
         3,144,010 page-faults               #    0.012 M/sec
   412,859,061,607 cycles                    #    1.612 GHz
   <not supported> stalled-cycles-frontend
   <not supported> stalled-cycles-backend
   199,715,055,614 instructions              #    0.48  insns per cycle
    36,232,946,697 branches                  #  141.467 M/sec
       461,875,115 branch-misses             #    1.27% of all branches

     248.692785230 seconds time elapsed
{code}


> Implement a new DFSOutputStream for logging WAL only
> ----------------------------------------------------
>
>                 Key: HBASE-14790
>                 URL: https://issues.apache.org/jira/browse/HBASE-14790
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all 
> purposes. But in fact, we do not need most of the features if we only want to 
> log WAL. For example, we do not need pipeline recovery since we could just 
> close the old logger and open a new one. And also, we do not need to write 
> multiple blocks since we could also open a new logger if the old file is too 
> large.
> And the most important thing is that, it is hard to handle all the corner 
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when 
> using original DFSOutputStream due to its complicated logic. And the 
> complicated logic also force us to use some magical tricks to increase 
> performance. For example, we need to use multiple threads to call {{hflush}} 
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when 
> logging WAL. For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only

Reply via email to