[ 
https://issues.apache.org/jira/browse/HIVE-18080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263861#comment-16263861
 ] 

liyunzhang commented on HIVE-18080:
-----------------------------------

I use vtune to see the assembly code of following code(If expression)
{code}
import java.util.Random;

/**
 *  * Created by lzhang66 on 11/13/2017.
 *   */
public class If {
  static long[] in1, in2,in3,out;
  public static void main(String[] args){
    if( args.length == 3){
      long warmIter=Long.parseLong(args[0]);
      System.out.println("warmIter num:"+warmIter);
      long iter=Long.parseLong(args[1]);
      System.out.println("iter num:"+iter);
      boolean enableHive10238= Boolean.parseBoolean(args[2]);
      long startTime = System.currentTimeMillis();
      calc(warmIter, iter,enableHive10238);
      long endTime   = System.currentTimeMillis();
      long totalTime = endTime - startTime;
      System.out.println("Total time:"+totalTime);
    }else{
      System.out.println("2 parameter need. Like java ReductionInt [warmIter] 
[iter] [enable");
      System.exit(0);
    }

  }

  public static void calc(long warmIter, long iter, boolean enableHive10238)
  {
    in1 = new long [1026];
    in2 = new long [1026];
    in3 = new long [1026];

    Random rand = new Random(435437646);
    for(int i=0; i<in1.length; i++)
    {
      in1[i] = rand.nextLong();
    }

    for(int i=0; i<in2.length; i++)
    {
      in2[i] = rand.nextLong();
    }

    for (int j = 0; j < warmIter; j++)
    {
       reduction_kernel(in1, in2, in1.length, enableHive10238);
    }

    long start = System.currentTimeMillis();
    for (int j = 0; j < iter; j++)
    {
      reduction_kernel(in1, in2, in1.length, enableHive10238);
    }

    long elapsedTimeMillis = System.currentTimeMillis()-start;
    System.out.println("Iterations Per milli Second:" + 
(iter)/elapsedTimeMillis+" ipms");
  }

  private static void reduction_kernel(long[] in1, long[] in2, int length, 
boolean enableHive10238) {
    out = new long[1026];
    if (enableHive10238) {
      for (int i1 = 0; i1 < in1.length; i1++) {
        out[i1] = (~(in1[i1] - 1L) & in2[i1]) |(( in1[i1] - 1L)& in3[i1]);
      }
    } else {
      for (int i1 = 0; i1 < in1.length; i1++) {
        out[i1] = (in1[i1] - 1L)>0? in2[i1]: in3[i1];
      }
    }
  }
}
{code}

run {{java If 5000 50000 true}} to enable the patch of HIVE-10238  and run 
{{java  If 5000 50000 false}} to disable the patch of HIVE-10238. Here there is 
three parameters, para1 means warmIter number, para2 means  Iter number, para3 
means with/wot HIVE-10238's patch.  In the attached picture, I saw AVX2 
instructions of if expression( {code}(in1[i1] - 1L)>0? in2[i1]: in3[i1]{code}).

> Performance degradation on 
> VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18080
>                 URL: https://issues.apache.org/jira/browse/HIVE-18080
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang
>         Attachments: log.logic.avx1.single.0, log_logic.avx1.part
>
>
> Use  Xeon(R) Platinum 8180 CPU to test the performance of 
> [AVX512|https://en.wikipedia.org/wiki/AVX-512].
> {code}
> #cat /proc/cpuinfo |grep "model name"|head -n 1
> model name    : Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
> {code}
> Before that I have compiled hive with JDK9 as JDK9 enables AVX512 
> Use hive microbenchmark(HIVE-10189) to evaluate the performance improvement. 
> It seems performance(20%+) in cases in 
> {{VectorizedArithmeticBench}},{{VectorizedComparisonBench}},{{VectorizedLikeBench}},{{VectorizedLogicBench}}
>  execpt 
> {{VectorizedLogicBench#IfExprLongColumnLongColumnBench}},{{VectorizedLogicBench#IfExprRepeatingLongColumnLongColumnBench}}
>  and
> {{VectorizedLogicBench#IfExprLongColumnRepeatingLongColumnBench}}.The data is 
> like following
> When i use Skylake CPU to evaluate the performance improvement of AVX512.
> I found the performance in VectorizedLogicBench is like following
> || ||AVX2 us/op||AVX512 us/op ||  (AVX2-AVX512)/AVX2||
> |ColAndColBench|122510| 87014| 28.9%|
> |IfExprLongColumnLongColumnBench | 1325759| 1436073| -8.3% |
> |IfExprLongColumnRepeatingLongColumnBench|1397447|1480450|  -5.9%|
> |IfExprRepeatingLongColumnLongColumnBench|1401164|1483062|  -5.9% |
> |NotColBench|77042.83|51513.28|  33%|
> There are degradation in 
> IfExprLongColumnLongColumnBench,IfExprLongColumnRepeatingLongColumnBench, 
> IfExprRepeatingLongColumnLongColumnBench, very confused why there is 
> degradation on IfExprLongColumnLongColumnBench cases.
> Here we use {{taskset -cp 1 $pid}} to run the benchmark on single core to 
> avoid the impact of dynamic CPU frequency scaling.
> my script
> {code}
> export JAVA_HOME=/home/zly/jdk-9.0.1/
> export PATH=$JAVA_HOME/bin:$PATH
> export LD_LIBRARY_PATH=/home/zly/jdk-9.0.1/mylib
> for i in 0 1 2; do
> java -server -XX:UseAVX=3 -jar benchmarks.jar 
> org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 
> -f 1 -bm avgt -tu us >log.logic.avx3.single.$i & export pid=$!
> taskset -cp 1 $pid
> wait $pid
> done
> for i in 0 1 2; do
> java -server -XX:UseAVX=2 -jar benchmarks.jar 
> org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 
> -f 1 -bm avgt -tu us >log.logic.avx2.single.$i & export pid=$!
> taskset -cp 1 $pid
> wait $pid
> done
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to