[jira] [Commented] (HIVE-17573) LLAP: JDK9 support fixes

Gopal V (JIRA) Wed, 03 Jan 2018 23:19:36 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310895#comment-16310895
 ]


Gopal V commented on HIVE-17573:
--------------------------------

[~kellyzly]: I'm using LLAP with ORC, loaded using the bin_flat tpc-h script in 
hive-testbench.

https://github.com/hortonworks/hive-testbench/tree/hdp26/ddl-tpch/bin_flat

The hardware is {{Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz}}, with 256Gb RAM 
and with the following NUMA organization

The memory is split as 128Gb Xmx + 32Gb cache for 24 executors, with a 180Gb 
container, which pretty much can fit the entire Q6 data in cache at the 1Tb 
scale.

If you have the text-cache enabled (this takes multiple flags), you might be 
able to get similar performance from the text data as well, but the significant 
ORC speedup comes from loading data into lineitem in a natural order (the 
production-like ingest results in one file per day).

{code}
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 131037 MB
node 0 free: 127359 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 131072 MB
node 1 free: 127987 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 
{code}

The setup uses a giant TLAB maximum so that the in-thread allocations go to the 
same NUMA zone.

{{-XX:TLABSize=128m -XX:+ResizeTLAB -XX:+UseNUMA -XX:+AggressiveOpts 
-XX:MetaspaceSize=1024m}}

JDK9 seems to wake up the producer-consumer pair on the same NUMA zone (the IO 
elevator allocates, passes the array to the executor thread and executor passes 
it back instead of throwing it to GC deref).

I'm not sure there's any actual movement on JEP-157 which would probably help 
this thread-to-thread object passing much more.

bq. From which tool, you can get above conclusion?

https://github.com/t3rmin4t0r/perf-map-agent/blob/jitdump/jit-objdump.sh

That's the script which I use to attach GDB to a running JIT process and 
extract a JIT sample, with the additional CPU perf events.

Here's an example of the final report I gather from the JIT (this was sent to 
Intel JDK team as a perf report, to see if they could fix {{public String(byte 
ascii[], int hibyte, int offset, int count)}} to be faster for very small 
strings).

http://people.apache.org/~gopalv/perf-29529.tbz2

This is a perf event capture which contains for Q6 on text data (instead of ORC)

{code}
perf record -ag -e 
cycles,instructions,branch-misses,LLC-prefetch-misses,cache-misses,LLC-store-misses,LLC-load-misses
{code}

along with the JIT generated assembly.

If you're on a x86_64 machine, then I guess run-report.sh should work.


> LLAP: JDK9 support fixes
> ------------------------
>
>                 Key: HIVE-17573
>                 URL: https://issues.apache.org/jira/browse/HIVE-17573
>             Project: Hive
>          Issue Type: Bug
>          Components: llap
>    Affects Versions: 3.0.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>
> The perf diff between JDK8 -> JDK9 seems to be significant.  
> TPC-H Q6 on JDK8 takes 32s on a single node + 1 Tb scale warehouse. 
> TPC-H Q6 on JDK9 takes 19s on the same host + same data.
> The performance difference seems to come from better JIT and better NUMA 
> handling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17573) LLAP: JDK9 support fixes

Reply via email to