[ 
https://issues.apache.org/jira/browse/IMPALA-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388014#comment-17388014
 ] 

pengdou1990 commented on IMPALA-10785:
--------------------------------------

I also test the solution as [~stigahuang]'s suggestion, the test result seems 
better than add padding to union and hdfs tuple, besides, as string and varchar 
in row batch's layout are the same, I think the pass through check should pass 
between  string type and varchar type.
h3. Text Plan
{code:java}
Max Per-Host Resource Reservation: Memory=28.00MB Threads=3
Per-Host Resource Estimates: Memory=356MB
WARNING: The following tables are missing relevant table and/or column 
statistics.
tpcds_10000_parquet.customer_kudu, tpcds_10000_parquet.customer_parquet
Analyzed query: SELECT max(c_customer_sk), ndv(c_customer_id),
ndv(c_salutation), ndv(c_first_name), ndv(c_last_name) FROM (SELECT
c_customer_sk, c_customer_id, c_salutation, c_first_name, c_last_name FROM
tpcds_10000_parquet.customer_parquet UNION ALL SELECT c_customer_sk,
c_customer_id, c_salutation, c_first_name, c_last_name FROM
tpcds_10000_parquet.customer_kudu) t

F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|  Per-Host Resources: mem-estimate=4.02MB mem-reservation=4.00MB 
thread-reservation=1
PLAN-ROOT SINK
|  output exprs: max(c_customer_sk), ndv(c_customer_id), ndv(c_salutation), 
ndv(c_first_name), ndv(c_last_name)
|  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0
|
05:AGGREGATE [FINALIZE]
|  output: max:merge(c_customer_sk), ndv:merge(c_customer_id), 
ndv:merge(c_salutation), ndv:merge(c_first_name), ndv:merge(c_last_name)
|  mem-estimate=16.00KB mem-reservation=0B spill-buffer=2.00MB 
thread-reservation=0
|  tuple-ids=5 row-size=36B cardinality=1
|  in pipelines: 05(GETNEXT), 03(OPEN)
|
04:EXCHANGE [UNPARTITIONED]
|  mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
|  tuple-ids=4 row-size=36B cardinality=1
|  in pipelines: 03(GETNEXT)
|
F02:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
Per-Host Resources: mem-estimate=352.02MB mem-reservation=24.00MB 
thread-reservation=2
03:AGGREGATE
|  output: max(c_customer_sk), ndv(c_customer_id), ndv(c_salutation), 
ndv(c_first_name), ndv(c_last_name)
|  mem-estimate=16.00KB mem-reservation=0B spill-buffer=2.00MB 
thread-reservation=0
|  tuple-ids=4 row-size=36B cardinality=1
|  in pipelines: 03(GETNEXT), 01(OPEN), 02(OPEN)
|
00:UNION
|  pass-through-operands: all
|  mem-estimate=0B mem-reservation=0B thread-reservation=0
|  tuple-ids=2 row-size=52B cardinality=23.34M
|  in pipelines: 01(GETNEXT), 02(GETNEXT)
|
|--02:SCAN KUDU [tpcds_10000_parquet.customer_kudu]
|     mem-estimate=7.50MB mem-reservation=0B thread-reservation=1
|     tuple-ids=1 row-size=52B cardinality=unavailable
|     in pipelines: 02(GETNEXT)
|
01:SCAN HDFS [tpcds_10000_parquet.customer_parquet, RANDOM]
   HDFS partitions=1/1 files=3 size=609.01MB
   stored statistics:
     table: rows=unavailable size=unavailable
     columns: unavailable
   extrapolated-rows=disabled max-scan-range-rows=unavailable
   mem-estimate=352.00MB mem-reservation=24.00MB thread-reservation=1
   tuple-ids=0 row-size=52B cardinality=23.34M
   in pipelines: 01(GETNEXT)
{code}
 
h3. Summary
{code:java}
Operator              #Hosts  #Inst   Avg Time   Max Time   #Rows  Est. #Rows  
Peak Mem  Est. Peak Mem  Detail                               
---------------------------------------------------------------------------------------------------------------------------------------------
F03:ROOT                   1      1    2.000ms    2.000ms                       
4.01 MB        4.00 MB                                       
05:AGGREGATE               1      1    0.000ns    0.000ns       1           1  
16.00 KB       16.00 KB  FINALIZE                             
04:EXCHANGE                1      1    0.000ns    0.000ns       3           1  
32.00 KB       16.00 KB  UNPARTITIONED                        
F02:EXCHANGE SENDER        3      3    0.000ns    0.000ns                       
24.00 B              0                                       
03:AGGREGATE               3      3  442.010ms  511.012ms       3           1   
1.28 MB       16.00 KB                                       
00:UNION                   3      3    0.000ns    0.000ns  24.51M      23.34M   
4.00 KB              0                                       
|--02:SCAN KUDU            3      3  818.686ms  906.022ms  12.26M          -1   
3.81 MB        7.50 MB  tpcds_10000_parquet.customer_kudu    
01:SCAN HDFS               3      3  147.670ms  169.004ms  12.26M      23.34M  
70.31 MB      352.00 MB  tpcds_10000_parquet.customer_parquet
{code}

> when union kudu table and hdfs table, union passthrough does not take effect
> ----------------------------------------------------------------------------
>
>                 Key: IMPALA-10785
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10785
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: pengdou1990
>            Assignee: pengdou1990
>            Priority: Major
>
> IMPALA-3586 already supports union passthrough, and brings great performance 
> improvements in union, but there is still some problems when union between 
> hdfs table and kudu table ,several points cause the problem:
>  # in kudu scanner node output TupleDescriptor, string slot is 16B,while in 
> hdfs scanner node output TupleDescriptor, string slot is 12B,cause tuple 
> memory layout mismatch
>  # in kudu scanner node output TupleDescriptor, string slot is 16B, while in 
> Union output TupleDescriptor, string slot is 12B,cause tuple memory layout 
> mismatch
>  # in Kudu Scannode, row key slot is not null, while in hdfs node, not null 
> slot can't get from the metadata, cause tuple memory layout mismatch
> I hive resolved the 1st and 2nd points, how should I do with the 3rd point?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to