[
https://issues.apache.org/jira/browse/IMPALA-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454246#comment-17454246
]
Wenzhe Zhou commented on IMPALA-6590:
-------------------------------------
Collected CPU profiles when running following scripts:
for i in 256 512 1024 2048 4096 8192 16384 32768; do (echo 'VALUES ('; for x in
$(seq $i); do echo "cast($x as string),"; done; echo "NULL); profile;") | time
impala-shell.sh -f /dev/stdin |& grep Analysis; done
The profiles show that there are lots of overhead when calling JNI function
Java_org_apache_impala_service_FeSupport_NativeEvalExprsWithoutRow() for
rewrite. About 30% of CPU were spent on this function when it calling
DeserializeThriftMsg().
Also measured the average times took by JNI function
Java_org_apache_impala_service_FeSupport_NativeEvalExprsWithoutRow() when
running the script for different number of columns as below:
column number: 256 512 1024 2048 4096 8192 16384 32768
average_time_in_ns: 430 219 112 59 32 37 60
63
The times took by
Java_org_apache_impala_service_FeSupport_NativeEvalExprsWithoutRow() varied as
umber of columns increases. This also contribute to non-linearly increase as
number of columns increases.
> Disable expr rewrites and codegen for VALUES() statements
> ---------------------------------------------------------
>
> Key: IMPALA-6590
> URL: https://issues.apache.org/jira/browse/IMPALA-6590
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
> Reporter: Alexander Behm
> Assignee: Wenzhe Zhou
> Priority: Major
> Labels: perf, planner, ramp-up, regression
>
> The analysis of statements with big VALUES clauses like INSERT INTO <tbl>
> VALUES is slow due to expression rewrites like constant folding. The
> performance of such statements has regressed since the introduction of expr
> rewrites and constant folding in IMPALA-1788.
> We should skip expr rewrites for VALUES altogether since it mostly provides
> no benefit but can have a large overhead due to evaluation of expressions in
> the backend (constant folding). These expressions are ultimately evaluated
> and materialized in the backend anyway, so there's no point in folding them
> during analysis.
> Similarly, there is no point in doing codegen for these exprs in the backend
> union node.
> *Workaround*
> {code}
> SET ENABLE_EXPR_REWRITES=FALSE;
> SET DISABLE_CODEGEN=TRUE;
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]