[ 
https://issues.apache.org/jira/browse/HADOOP-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HADOOP-15554:
---------------------------------
    Status: Patch Available  (was: Open)

Attached patch refactors out the config-parsing code to a new inner class with 
a bunch of smaller functions which are easier to compile. I also took the 
opportunity to make a few micro-optimizations like avoiding construction of the 
confSources array in the common case that the config file uses no "<source>" 
tags.

I tested the improvement by running:

{code}
for x in $(seq 1 60); do
  java -XX:+CITime -cp 
hadoop-common-project/hadoop-common/target/hadoop-common-3.2.0-SNAPSHOT.jar:$CP 
\
         org.apache.hadoop.examples.ExampleDriver pi 1 1 2>&1  \
       | grep 'Total comp'
done | tee /tmp/patch.txt
{code}

to measure the total compilation time in a simple LocalJobRunner MR job. I 
grepped out the times and ran a t-test using R:

{code}
data:  d.orig and d.patched
t = 36.511, df = 110.1, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.7329980 0.8171354
sample estimates:
mean of x mean of y
 3.508300  2.733233
{code}

So this saves about 730-810ms of CPU time spent by the JIT.

To test throughput, I used the ConfTest.java program from HADOOP-14216.

{code}
orig:

duration: 20745 count: 3561000

real    0m21.104s
user    0m29.296s
sys     0m1.903s

patch:

duration: 21810 count: 3561000

real    0m22.304s
user    0m27.013s
sys     0m2.547s
{code}

So it seems around the same - a bit less user time, a bit longer real time. 
Close enough to call "not a regression".

I also tried 'fs -ls hdfs://nn/' under 'perf stat -r10':

{code}
orig:


       5295.930635      task-clock (msec)         #    3.454 CPUs utilized      
      ( +-  3.56% )
            10,977      context-switches          #    0.002 M/sec              
      ( +-  0.37% )
               613      cpu-migrations            #    0.116 K/sec              
      ( +-  2.28% )
            86,804      page-faults               #    0.016 M/sec              
      ( +-  0.12% )
    14,823,251,627      cycles                    #    2.799 GHz                
      ( +-  3.61% )
    11,367,265,626      instructions              #    0.77  insn per cycle     
      ( +-  1.81% )
     2,503,093,507      branches                  #  472.645 M/sec              
      ( +-  3.26% )
        67,066,880      branch-misses             #    2.68% of all branches    
      ( +-  0.23% )

       1.533354188 seconds time elapsed                                         
 ( +-  0.54% )

patch:

       5173.366209      task-clock (msec)         #    3.384 CPUs utilized      
      ( +-  3.60% )
            11,160      context-switches          #    0.002 M/sec              
      ( +-  1.32% )
               630      cpu-migrations            #    0.122 K/sec              
      ( +-  2.82% )
            87,732      page-faults               #    0.017 M/sec              
      ( +-  0.18% )
    14,495,009,185      cycles                    #    2.802 GHz                
      ( +-  3.55% )
    11,485,553,655      instructions              #    0.79  insn per cycle     
      ( +-  1.80% )
     2,487,385,519      branches                  #  480.806 M/sec              
      ( +-  3.34% )
        68,583,976      branch-misses             #    2.76% of all branches    
      ( +-  0.25% )

       1.528788291 seconds time elapsed                                         
 ( +-  0.62% )
{code}

 and 'yarn application -list' on an RM running no applications:

{code}
orig:
       2150.752819      task-clock (msec)         #    2.101 CPUs utilized      
      ( +-  0.89% )
             9,179      context-switches          #    0.004 M/sec              
      ( +-  0.66% )
               476      cpu-migrations            #    0.221 K/sec              
      ( +-  3.20% )
            46,036      page-faults               #    0.021 M/sec              
      ( +-  0.13% )
     5,928,445,661      cycles                    #    2.756 GHz                
      ( +-  0.98% )
     6,382,601,882      instructions              #    1.08  insn per cycle     
      ( +-  0.61% )
     1,153,880,261      branches                  #  536.501 M/sec              
      ( +-  0.60% )
        47,370,186      branch-misses             #    4.11% of all branches    
      ( +-  0.65% )

       1.023657616 seconds time elapsed                                         
 ( +-  0.59% )


patch:

       2106.716373      task-clock (msec)         #    2.091 CPUs utilized      
      ( +-  0.70% )
             9,113      context-switches          #    0.004 M/sec              
      ( +-  0.62% )
               451      cpu-migrations            #    0.214 K/sec              
      ( +-  1.46% )
            47,218      page-faults               #    0.022 M/sec              
      ( +-  0.09% )
     5,769,853,936      cycles                    #    2.739 GHz                
      ( +-  0.73% )
     6,320,641,188      instructions              #    1.10  insn per cycle     
      ( +-  0.31% )
     1,141,174,880      branches                  #  541.684 M/sec              
      ( +-  0.31% )
        46,945,771      branch-misses             #    4.11% of all branches    
      ( +-  0.40% )

       1.007474613 seconds time elapsed                                         
 ( +-  0.50% )
{code}

So it seems a slight saving in cycles for both of those applications.

> Improve JIT performance for Configuration parsing
> -------------------------------------------------
>
>                 Key: HADOOP-15554
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15554
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: conf, performance
>    Affects Versions: 3.0.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: hadoop-15554.patch
>
>
> In investigating a performance regression for small tasks between Hadoop 2 
> and Hadoop 3, we found that the amount of time spent in JIT was significantly 
> higher. Using jitwatch we were able to determine that, due to a combination 
> of switching from DOM to SAX style parsing and just having more configuration 
> key/value pairs, Configuration.loadResource is now getting compiled with the 
> C2 compiler and taking quite some time. Breaking that very large function up 
> into several smaller ones and eliminating some redundant bits of code 
> improves the JIT performance measurably.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to