[
https://issues.apache.org/jira/browse/IMPALA-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Philip Zeyliger resolved IMPALA-5243.
-------------------------------------
Resolution: Fixed
{code}
commit 43ef80e4f1c93ea69883a4670c79ba10b0ed0432
Author: Philip Zeyliger <[email protected]>
Date: Wed Sep 13 09:04:56 2017 -0700
IMPALA-5243: Speed up code gen for wide Avro tables.
HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in
size to the number of columns. On 1000 column tables, codegen time is
significant. This commit roughly halves it for wide columns.
(Note that this had been much worse in recent history (<= Impala 2.9).)
It does so by breaking up MaterializeTuple() into multiple smaller
functions, and then calls them in order. When breaking up into
200-column chunks, there is a noticeable speed-up.
I've made the helper code for generating LLVM function prototypes
have a mutable function name, so that the builder can be re-used
multiple times.
I've checked by inspecting optimized LLVM that in the case where there's
only 1 helper function, code gets inlined so that there doesn't seem to
be an extra function.
I measured codegen time for various "step sizes." The case where there
are no helper functions is about 2.7s. The best case was about a step
size of 200, with timings of 1.3s.
For the query "select count(int_col16) from
functional_avro.widetable_1000_cols",
codegen times as a function of step size are roughly as follows. This is
averaged across 5 executions, and rounded to 0.1s.
step time
10 2.4
50 2.5
75 2.9
100 3.0
125 3.0
150 1.4
175 1.3
200 1.3 <-- chosen step size
225 1.5
250 1.4
300 1.6
400 1.6
500 1.8
1000 2.7
The raw data was generated like so, with some code that let me change the
step size at runtime:
$(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do
for try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step ";
impala-shell.sh -q "select count(int_col16) from
functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 'CodeGe
n:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' |
sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) |
tee out.txt
...
200 CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%)
CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns
ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K
OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.01
4ms
...
1000 CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%)
CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns
ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K
OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms
I have run the core tests with this change.
Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Reviewed-on: http://gerrit.cloudera.org:8080/8211
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins
{code}
> Slow codegen for wide Avro tables
> ---------------------------------
>
> Key: IMPALA-5243
> URL: https://issues.apache.org/jira/browse/IMPALA-5243
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 2.7.0, Impala 2.8.0
> Reporter: Alexander Behm
> Assignee: Philip Zeyliger
> Labels: codegen, performance, ramp-up
> Attachments: screenshot-1.png
>
>
> Codegen gets rather expensive when scanning wide Avro tables (>500 columns),
> regardless of how many columns are materialized by the query.
> {code}
> select count(int_col16) from functional_avro.widetable_250_cols;
> +------------------+
> | count(int_col16) |
> +------------------+
> | 10 |
> +------------------+
> Fetched 1 row(s) in 0.93s
> select count(int_col16) from functional_avro.widetable_500_cols;
> +------------------+
> | count(int_col16) |
> +------------------+
> | 10 |
> +------------------+
> Fetched 1 row(s) in 2.87s
> select count(int_col16) from widetable_1000_cols;
> +------------------+
> | count(int_col16) |
> +------------------+
> | 10 |
> +------------------+
> Fetched 1 row(s) in 10.58s
> {code}
> For the last query with 1000 columns, here's the codegen snippet from the
> query profile:
> {code}
> CodeGen:(Total: 10s115ms, non-child: 10s115ms, % non-child: 100.00%)
> - CodegenTime: 530.211us
> - CompileTime: 1s683ms
> - LoadTime: 0.000ns
> - ModuleBitcodeSize: 1.98 MB (2073044)
> - NumFunctions: 32 (32)
> - NumInstructions: 8.41K (8413)
> - OptimizationTime: 8s416ms
> - PeakMemoryUsage: 4.11 MB (4307456)
> - PrepareTime: 15.357ms
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)