[ 
https://issues.apache.org/jira/browse/HIVE-18080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258853#comment-16258853
 ] 

liyunzhang edited comment on HIVE-18080 at 11/20/17 6:38 AM:
-------------------------------------------------------------

[~gopalv]: using following command with {{-prof perfasm}} to run the 
VectorizedLogicBench#IfExprLongColumnLongColumnBench in AVX1
{code}
export JAVA_HOME=/home/zly/sr601/jdk-9.0.1/
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/home/zly/sr601/jdk-9.0.1/mylib
i=0
java -server -XX:UseAVX=1 -jar benchmarks.jar  -prof perfasm 
org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 -f 
1 -bm avgt -tu us >log.logic.avx1.single.$i & export pid=$!
taskset -cp 1 $pid
wait $pid
{code}

the 
[log.logic.avx1.single.0|https://issues.apache.org/jira/secure/attachment/12898421/log.logic.avx1.single.0]
 attached, find some warning
{code}
PrintAssembly processed: 51105 total address lines.
Perf output processed (skipped 1.020 seconds):
 Column 1: cycles (0 events)
 Column 2: instructions (0 events)

....[Hottest 
Regions]...............................................................................
....................................................................................................
                  <totals>

....[Hottest Methods (after 
inlining)]..............................................................
....................................................................................................
                  <totals>

....[Distribution by 
Area]..........................................................................
....................................................................................................
                  <totals>

WARNING: The perf event count is suspiciously low (0). The performance data 
might be
inaccurate or misleading. Try to do the profiling again, or tune up the 
sampling frequency.
{code}


was (Author: kellyzly):
[~gopal]: using following command with {{-prof perfasm}} to run the 
VectorizedLogicBench#IfExprLongColumnLongColumnBench in AVX1
{code}
export JAVA_HOME=/home/zly/sr601/jdk-9.0.1/
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/home/zly/sr601/jdk-9.0.1/mylib
i=0
java -server -XX:UseAVX=1 -jar benchmarks.jar  -prof perfasm 
org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 -f 
1 -bm avgt -tu us >log.logic.avx1.single.$i & export pid=$!
taskset -cp 1 $pid
wait $pid
{code}

the output attached, find some warning
{code}
PrintAssembly processed: 51105 total address lines.
Perf output processed (skipped 1.020 seconds):
 Column 1: cycles (0 events)
 Column 2: instructions (0 events)

....[Hottest 
Regions]...............................................................................
....................................................................................................
                  <totals>

....[Hottest Methods (after 
inlining)]..............................................................
....................................................................................................
                  <totals>

....[Distribution by 
Area]..........................................................................
....................................................................................................
                  <totals>

WARNING: The perf event count is suspiciously low (0). The performance data 
might be
inaccurate or misleading. Try to do the profiling again, or tune up the 
sampling frequency.
{code}

> Performance degradation on 
> VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18080
>                 URL: https://issues.apache.org/jira/browse/HIVE-18080
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang
>         Attachments: log.logic.avx1.single.0, log_logic.avx1.part
>
>
> Use  Xeon(R) Platinum 8180 CPU to test the performance of 
> [AVX512|https://en.wikipedia.org/wiki/AVX-512].
> {code}
> #cat /proc/cpuinfo |grep "model name"|head -n 1
> model name    : Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
> {code}
> Before that I have compiled hive with JDK9 as JDK9 enables AVX512 
> Use hive microbenchmark(HIVE-10189) to evaluate the performance improvement. 
> It seems performance(20%+) in cases in 
> {{VectorizedArithmeticBench}},{{VectorizedComparisonBench}},{{VectorizedLikeBench}},{{VectorizedLogicBench}}
>  execpt 
> {{VectorizedLogicBench#IfExprLongColumnLongColumnBench}},{{VectorizedLogicBench#IfExprRepeatingLongColumnLongColumnBench}}
>  and
> {{VectorizedLogicBench#IfExprLongColumnRepeatingLongColumnBench}}.The data is 
> like following
> When i use Skylake CPU to evaluate the performance improvement of AVX512.
> I found the performance in VectorizedLogicBench is like following
> || ||AVX2 us/op||AVX512 us/op ||  (AVX2-AVX512)/AVX2||
> |ColAndColBench|122510| 87014| 28.9%|
> |IfExprLongColumnLongColumnBench | 1325759| 1436073| -8.3% |
> |IfExprLongColumnRepeatingLongColumnBench|1397447|1480450|  -5.9%|
> |IfExprRepeatingLongColumnLongColumnBench|1401164|1483062|  -5.9% |
> |NotColBench|77042.83|51513.28|  33%|
> There are degradation in 
> IfExprLongColumnLongColumnBench,IfExprLongColumnRepeatingLongColumnBench, 
> IfExprRepeatingLongColumnLongColumnBench, very confused why there is 
> degradation on IfExprLongColumnLongColumnBench cases.
> Here we use {{taskset -cp 1 $pid}} to run the benchmark on single core to 
> avoid the impact of dynamic CPU frequency scaling.
> my script
> {code}
> export JAVA_HOME=/home/zly/jdk-9.0.1/
> export PATH=$JAVA_HOME/bin:$PATH
> export LD_LIBRARY_PATH=/home/zly/jdk-9.0.1/mylib
> for i in 0 1 2; do
> java -server -XX:UseAVX=3 -jar benchmarks.jar 
> org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 
> -f 1 -bm avgt -tu us >log.logic.avx3.single.$i & export pid=$!
> taskset -cp 1 $pid
> wait $pid
> done
> for i in 0 1 2; do
> java -server -XX:UseAVX=2 -jar benchmarks.jar 
> org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 
> -f 1 -bm avgt -tu us >log.logic.avx2.single.$i & export pid=$!
> taskset -cp 1 $pid
> wait $pid
> done
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to