[ 
https://issues.apache.org/jira/browse/IMPALA-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454246#comment-17454246
 ] 

Wenzhe Zhou commented on IMPALA-6590:
-------------------------------------

Collected CPU profiles when running following scripts: 
for i in 256 512 1024 2048 4096 8192 16384 32768; do (echo 'VALUES ('; for x in 
$(seq $i); do echo  "cast($x as string),"; done; echo "NULL); profile;") | time 
impala-shell.sh -f /dev/stdin |& grep Analysis; done 
The profiles show that there are lots of overhead when calling JNI function 
Java_org_apache_impala_service_FeSupport_NativeEvalExprsWithoutRow() for 
rewrite. About 30% of CPU were spent on this function when it calling 
DeserializeThriftMsg(). 

Also measured the average times took by  JNI function 
Java_org_apache_impala_service_FeSupport_NativeEvalExprsWithoutRow() when 
running the script for different number of columns as below:

      column number:          256  512  1024  2048  4096  8192  16384  32768
      average_time_in_ns:  430  219  112    59       32       37        60      
  63

The times took by 
Java_org_apache_impala_service_FeSupport_NativeEvalExprsWithoutRow() varied as 
umber of columns increases. This also contribute to non-linearly increase as 
number of columns increases.

> Disable expr rewrites and codegen for VALUES() statements
> ---------------------------------------------------------
>
>                 Key: IMPALA-6590
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6590
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
>            Reporter: Alexander Behm
>            Assignee: Wenzhe Zhou
>            Priority: Major
>              Labels: perf, planner, ramp-up, regression
>
> The analysis of statements with big VALUES clauses like INSERT INTO <tbl> 
> VALUES is slow due to expression rewrites like constant folding. The 
> performance of such statements has regressed since the introduction of expr 
> rewrites and constant folding in IMPALA-1788.
> We should skip expr rewrites for VALUES altogether since it mostly provides 
> no benefit but can have a large overhead due to evaluation of expressions in 
> the backend (constant folding). These expressions are ultimately evaluated 
> and materialized in the backend anyway, so there's no point in folding them 
> during analysis.
> Similarly, there is no point in doing codegen for these exprs in the backend 
> union node.
> *Workaround*
> {code}
> SET ENABLE_EXPR_REWRITES=FALSE;
> SET DISABLE_CODEGEN=TRUE;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to