Hi, I find In impala, If I have a table T and a column attributes which
defained as Map<string, string>, I have to write SQL like this :


select attributes.value, T.name from  T join T.attributes where attributes
.key = 'play'.

However, We query this table in hive like this :

select T.attributes['play'], T.name from T

And I find the execution of impala is Slow caused of the JOIN.
If I execute :

select count(1) from T;

It takes 0.5 seconds and If I add join attributes, It will take 8s +.

the summary like this :

Operator                    #Hosts   Avg Time   Max Time    #Rows
Est. #Rows   Peak Mem  Est. Peak Mem  Detail
-------------------------------------------------------------------------------------------------------------------------------
07:AGGREGATE                     1   85.020ms   85.020ms        1
     1  112.00 KB        -1.00 B  FINALIZE
06:EXCHANGE                      1  404.928us  404.928us       25
     1          0        -1.00 B  UNPARTITIONED
05:AGGREGATE                    25  104.354ms  141.573ms       25
     1    8.35 MB       10.00 MB
01:SUBPLAN                      25    1s218ms    1s813ms  201.54M
668.96B    8.25 MB              0
|--04:NESTED LOOP JOIN          25    3s250ms    4s852ms        0
    10   24.00 KB        16.00 B  CROSS JOIN
|  |--02:SINGULAR ROW SRC       25    0.000ns    0.000ns        0
     1          0              0
|  03:UNNEST                    25  688.110ms    1s033ms        0
    10          0              0  uda_events.attributes
00:SCAN HDFS                    25  112.396ms  206.405ms  201.61M
66.90B  107.20 MB       48.00 MB  uda.uda_events


Operator       #Hosts   Avg Time   Max Time    #Rows  Est. #Rows
Peak Mem  Est. Peak Mem  Detail
-----------------------------------------------------------------------------------------------------------
03:AGGREGATE        1   87.908ms   87.908ms        1           1
112.00 KB        -1.00 B  FINALIZE
02:EXCHANGE         1  321.296us  321.296us       25           1
   0        -1.00 B  UNPARTITIONED
01:AGGREGATE       25   74.613ms   81.567ms       25           1
12.00 KB       10.00 MB
00:SCAN HDFS       25   91.512ms  135.441ms  208.47M      66.90B
3.09 MB              0  uda.uda_events


I have two question :
1、 Why Impala implements map column like this( via join) ?
2、 How to speed up the speed the query like this?

Thanks very much.

Reply via email to