[
https://issues.apache.org/jira/browse/HIVE-18080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263861#comment-16263861
]
liyunzhang commented on HIVE-18080:
-----------------------------------
I use vtune to see the assembly code of following code(If expression)
{code}
import java.util.Random;
/**
* * Created by lzhang66 on 11/13/2017.
* */
public class If {
static long[] in1, in2,in3,out;
public static void main(String[] args){
if( args.length == 3){
long warmIter=Long.parseLong(args[0]);
System.out.println("warmIter num:"+warmIter);
long iter=Long.parseLong(args[1]);
System.out.println("iter num:"+iter);
boolean enableHive10238= Boolean.parseBoolean(args[2]);
long startTime = System.currentTimeMillis();
calc(warmIter, iter,enableHive10238);
long endTime = System.currentTimeMillis();
long totalTime = endTime - startTime;
System.out.println("Total time:"+totalTime);
}else{
System.out.println("2 parameter need. Like java ReductionInt [warmIter]
[iter] [enable");
System.exit(0);
}
}
public static void calc(long warmIter, long iter, boolean enableHive10238)
{
in1 = new long [1026];
in2 = new long [1026];
in3 = new long [1026];
Random rand = new Random(435437646);
for(int i=0; i<in1.length; i++)
{
in1[i] = rand.nextLong();
}
for(int i=0; i<in2.length; i++)
{
in2[i] = rand.nextLong();
}
for (int j = 0; j < warmIter; j++)
{
reduction_kernel(in1, in2, in1.length, enableHive10238);
}
long start = System.currentTimeMillis();
for (int j = 0; j < iter; j++)
{
reduction_kernel(in1, in2, in1.length, enableHive10238);
}
long elapsedTimeMillis = System.currentTimeMillis()-start;
System.out.println("Iterations Per milli Second:" +
(iter)/elapsedTimeMillis+" ipms");
}
private static void reduction_kernel(long[] in1, long[] in2, int length,
boolean enableHive10238) {
out = new long[1026];
if (enableHive10238) {
for (int i1 = 0; i1 < in1.length; i1++) {
out[i1] = (~(in1[i1] - 1L) & in2[i1]) |(( in1[i1] - 1L)& in3[i1]);
}
} else {
for (int i1 = 0; i1 < in1.length; i1++) {
out[i1] = (in1[i1] - 1L)>0? in2[i1]: in3[i1];
}
}
}
}
{code}
run {{java If 5000 50000 true}} to enable the patch of HIVE-10238 and run
{{java If 5000 50000 false}} to disable the patch of HIVE-10238. Here there is
three parameters, para1 means warmIter number, para2 means Iter number, para3
means with/wot HIVE-10238's patch. In the attached picture, I saw AVX2
instructions of if expression( {code}(in1[i1] - 1L)>0? in2[i1]: in3[i1]{code}).
> Performance degradation on
> VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled
> ------------------------------------------------------------------------------------------------------
>
> Key: HIVE-18080
> URL: https://issues.apache.org/jira/browse/HIVE-18080
> Project: Hive
> Issue Type: Bug
> Reporter: liyunzhang
> Attachments: log.logic.avx1.single.0, log_logic.avx1.part
>
>
> Use Xeon(R) Platinum 8180 CPU to test the performance of
> [AVX512|https://en.wikipedia.org/wiki/AVX-512].
> {code}
> #cat /proc/cpuinfo |grep "model name"|head -n 1
> model name : Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
> {code}
> Before that I have compiled hive with JDK9 as JDK9 enables AVX512
> Use hive microbenchmark(HIVE-10189) to evaluate the performance improvement.
> It seems performance(20%+) in cases in
> {{VectorizedArithmeticBench}},{{VectorizedComparisonBench}},{{VectorizedLikeBench}},{{VectorizedLogicBench}}
> execpt
> {{VectorizedLogicBench#IfExprLongColumnLongColumnBench}},{{VectorizedLogicBench#IfExprRepeatingLongColumnLongColumnBench}}
> and
> {{VectorizedLogicBench#IfExprLongColumnRepeatingLongColumnBench}}.The data is
> like following
> When i use Skylake CPU to evaluate the performance improvement of AVX512.
> I found the performance in VectorizedLogicBench is like following
> || ||AVX2 us/op||AVX512 us/op || (AVX2-AVX512)/AVX2||
> |ColAndColBench|122510| 87014| 28.9%|
> |IfExprLongColumnLongColumnBench | 1325759| 1436073| -8.3% |
> |IfExprLongColumnRepeatingLongColumnBench|1397447|1480450| -5.9%|
> |IfExprRepeatingLongColumnLongColumnBench|1401164|1483062| -5.9% |
> |NotColBench|77042.83|51513.28| 33%|
> There are degradation in
> IfExprLongColumnLongColumnBench,IfExprLongColumnRepeatingLongColumnBench,
> IfExprRepeatingLongColumnLongColumnBench, very confused why there is
> degradation on IfExprLongColumnLongColumnBench cases.
> Here we use {{taskset -cp 1 $pid}} to run the benchmark on single core to
> avoid the impact of dynamic CPU frequency scaling.
> my script
> {code}
> export JAVA_HOME=/home/zly/jdk-9.0.1/
> export PATH=$JAVA_HOME/bin:$PATH
> export LD_LIBRARY_PATH=/home/zly/jdk-9.0.1/mylib
> for i in 0 1 2; do
> java -server -XX:UseAVX=3 -jar benchmarks.jar
> org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20
> -f 1 -bm avgt -tu us >log.logic.avx3.single.$i & export pid=$!
> taskset -cp 1 $pid
> wait $pid
> done
> for i in 0 1 2; do
> java -server -XX:UseAVX=2 -jar benchmarks.jar
> org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20
> -f 1 -bm avgt -tu us >log.logic.avx2.single.$i & export pid=$!
> taskset -cp 1 $pid
> wait $pid
> done
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)