Hi, I find In impala, If I have a table T and a column attributes which
defained as Map<string, string>, I have to write SQL like this :
select attributes.value, T.name from T join T.attributes where attributes
.key = 'play'.
However, We query this table in hive like this :
select T.attributes['play'], T.name from T
And I find the execution of impala is Slow caused of the JOIN.
If I execute :
select count(1) from T;
It takes 0.5 seconds and If I add join attributes, It will take 8s +.
the summary like this :
Operator #Hosts Avg Time Max Time #Rows
Est. #Rows Peak Mem Est. Peak Mem Detail
-------------------------------------------------------------------------------------------------------------------------------
07:AGGREGATE 1 85.020ms 85.020ms 1
1 112.00 KB -1.00 B FINALIZE
06:EXCHANGE 1 404.928us 404.928us 25
1 0 -1.00 B UNPARTITIONED
05:AGGREGATE 25 104.354ms 141.573ms 25
1 8.35 MB 10.00 MB
01:SUBPLAN 25 1s218ms 1s813ms 201.54M
668.96B 8.25 MB 0
|--04:NESTED LOOP JOIN 25 3s250ms 4s852ms 0
10 24.00 KB 16.00 B CROSS JOIN
| |--02:SINGULAR ROW SRC 25 0.000ns 0.000ns 0
1 0 0
| 03:UNNEST 25 688.110ms 1s033ms 0
10 0 0 uda_events.attributes
00:SCAN HDFS 25 112.396ms 206.405ms 201.61M
66.90B 107.20 MB 48.00 MB uda.uda_events
Operator #Hosts Avg Time Max Time #Rows Est. #Rows
Peak Mem Est. Peak Mem Detail
-----------------------------------------------------------------------------------------------------------
03:AGGREGATE 1 87.908ms 87.908ms 1 1
112.00 KB -1.00 B FINALIZE
02:EXCHANGE 1 321.296us 321.296us 25 1
0 -1.00 B UNPARTITIONED
01:AGGREGATE 25 74.613ms 81.567ms 25 1
12.00 KB 10.00 MB
00:SCAN HDFS 25 91.512ms 135.441ms 208.47M 66.90B
3.09 MB 0 uda.uda_events
I have two question :
1、 Why Impala implements map column like this( via join) ?
2、 How to speed up the speed the query like this?
Thanks very much.