[jira] [Commented] (DRILL-2290) Very slow performance for a query involving nested map

Chun Chang (JIRA) Tue, 24 Feb 2015 10:53:27 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335232#comment-14335232
 ]


Chun Chang commented on DRILL-2290:
-----------------------------------

Dataset can be downloaded from 
https://s3.amazonaws.com/apache-drill/files/complex.json.gz

plan:

{code}
0: jdbc:drill:schema=dfs.drillTestDirComplexJ> explain plan for select b.id, 
a.ooa[1].fl.f1, b.oooi, a.ooof.oa.oab.oabc from `complex.json` a inner join 
`complex.json` b on a.ooa[1].fl.f1=b.ooa[1].fl.f1 order by b.id limit 20;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      Project(id=[$0], EXPR$1=[$1], oooi=[$2], EXPR$3=[$3])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[20])
00-04            SingleMergeExchange(sort0=[0 ASC])
01-01              SelectionVectorRemover
01-02                TopN(limit=[20])
01-03                  HashToRandomExchange(dist0=[[$0]])
02-01                    Project(id=[$3], EXPR$1=[$1], oooi=[$4], EXPR$3=[$2])
02-02                      HashJoin(condition=[=($0, $5)], joinType=[inner])
02-04                        HashToRandomExchange(dist0=[[$0]])
03-01                          Project($f5=[ITEM(ITEM(ITEM($1, 1), 'fl'), 
'f1')], ITEM=[ITEM(ITEM(ITEM($1, 1), 'fl'), 'f1')], ITEM2=[ITEM(ITEM(ITEM($0, 
'oa'), 'oab'), 'oabc')])
03-02                            Scan(groupscan=[EasyGroupScan 
[selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, 
columns=[`ooa`[1].`fl`.`f1`, `ooof`.`oa`.`oab`.`oabc`], 
files=[maprfs:/drill/testdata/complex_type/json/complex.json]]])
02-03                        Project(id=[$0], oooi=[$1], $f50=[$2])
02-05                          HashToRandomExchange(dist0=[[$2]])
04-01                            Project(id=[$2], oooi=[$1], 
$f5=[ITEM(ITEM(ITEM($0, 1), 'fl'), 'f1')])
04-02                              Scan(groupscan=[EasyGroupScan 
[selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, 
columns=[`id`, `oooi`, `ooa`[1].`fl`.`f1`], 
files=[maprfs:/drill/testdata/complex_type/json/complex.json]]])
{code}

> Very slow performance for a query involving nested map
> ------------------------------------------------------
>
>                 Key: DRILL-2290
>                 URL: https://issues.apache.org/jira/browse/DRILL-2290
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 0.8.0
>            Reporter: Chun Chang
>            Assignee: Daniel Barclay (Drill)
>
> #Thu Feb 19 18:40:10 EST 2015
> git.commit.id.abbrev=1ceddff
> This query took 17 minutes to complete. Too long. I think this happened after 
> the fix dealing with nested maps.
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select b.id, a.ooa[1].fl.f1, 
> b.oooi, a.ooof.oa.oab.oabc from `complex.json` a inner join `complex.json` b 
> on a.ooa[1].fl.f1=b.ooa[1].fl.f1 order by b.id limit 20;
> +------------+------------+------------+------------+
> |     id     |   EXPR$1   |    oooi    |   EXPR$3   |
> +------------+------------+------------+------------+
> | 1          | 1.6789     | {"oa":{"oab":{"oabc":1}}} | 1.5678     |
> | 3          | 3.6789     | {"oa":{"oab":{"oabc":3}}} | 3.5678     |
> | 4          | 4.6789     | {"oa":{"oab":{"oabc":4}}} | 4.5678     |
> | 5          | 5.6789     | {"oa":{"oab":{"oabc":5}}} | 5.5678     |
> | 7          | 7.6789     | {"oa":{"oab":{"oabc":7}}} | 7.5678     |
> | 9          | 9.6789     | {"oa":{"oab":{"oabc":9}}} | 9.5678     |
> | 10         | 10.6789    | {"oa":{"oab":{"oabc":10}}} | 10.5678    |
> | 11         | 11.6789    | {"oa":{"oab":{"oabc":11}}} | 11.5678    |
> | 13         | 13.6789    | {"oa":{"oab":{"oabc":13}}} | 13.5678    |
> | 14         | 14.6789    | {"oa":{"oab":{"oabc":14}}} | 14.5678    |
> | 15         | 15.6789    | {"oa":{"oab":{"oabc":15}}} | 15.5678    |
> | 16         | 16.6789    | {"oa":{"oab":{"oabc":16}}} | 16.5678    |
> | 17         | 17.6789    | {"oa":{"oab":{"oabc":17}}} | 17.5678    |
> | 18         | 18.6789    | {"oa":{"oab":{"oabc":18}}} | 18.5678    |
> | 19         | 19.6789    | {"oa":{"oab":{"oabc":19}}} | 19.5678    |
> | 20         | 20.6789    | {"oa":{"oab":{"oabc":20}}} | 20.5678    |
> | 21         | 21.6789    | {"oa":{"oab":{"oabc":21}}} | 21.5678    |
> | 22         | 22.6789    | {"oa":{"oab":{"oabc":22}}} | 22.5678    |
> | 24         | 24.6789    | {"oa":{"oab":{"oabc":24}}} | 24.5678    |
> | 25         | 25.6789    | {"oa":{"oab":{"oabc":25}}} | 25.5678    |
> +------------+------------+------------+------------+
> 20 rows selected (1020.036 seconds)
> {code}
> The query deals just a little less than 1 million records so should not be 
> that slow.
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select count(*) from (select 
> b.id, a.ooa[1].fl.f1, b.oooi, a.ooof.oa.oab.oabc from `complex.json` a inner 
> join `complex.json` b on a.ooa[1].fl.f1=b.ooa[1].fl.f1) c;
> +------------+
> |   EXPR$0   |
> +------------+
> | 900190     |
> +------------+
> 1 row selected (700.516 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2290) Very slow performance for a query involving nested map

Reply via email to