[
https://issues.apache.org/jira/browse/DRILL-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335232#comment-14335232
]
Chun Chang commented on DRILL-2290:
-----------------------------------
Dataset can be downloaded from
https://s3.amazonaws.com/apache-drill/files/complex.json.gz
plan:
{code}
0: jdbc:drill:schema=dfs.drillTestDirComplexJ> explain plan for select b.id,
a.ooa[1].fl.f1, b.oooi, a.ooof.oa.oab.oabc from `complex.json` a inner join
`complex.json` b on a.ooa[1].fl.f1=b.ooa[1].fl.f1 order by b.id limit 20;
+------------+------------+
| text | json |
+------------+------------+
| 00-00 Screen
00-01 Project(id=[$0], EXPR$1=[$1], oooi=[$2], EXPR$3=[$3])
00-02 SelectionVectorRemover
00-03 Limit(fetch=[20])
00-04 SingleMergeExchange(sort0=[0 ASC])
01-01 SelectionVectorRemover
01-02 TopN(limit=[20])
01-03 HashToRandomExchange(dist0=[[$0]])
02-01 Project(id=[$3], EXPR$1=[$1], oooi=[$4], EXPR$3=[$2])
02-02 HashJoin(condition=[=($0, $5)], joinType=[inner])
02-04 HashToRandomExchange(dist0=[[$0]])
03-01 Project($f5=[ITEM(ITEM(ITEM($1, 1), 'fl'),
'f1')], ITEM=[ITEM(ITEM(ITEM($1, 1), 'fl'), 'f1')], ITEM2=[ITEM(ITEM(ITEM($0,
'oa'), 'oab'), 'oabc')])
03-02 Scan(groupscan=[EasyGroupScan
[selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1,
columns=[`ooa`[1].`fl`.`f1`, `ooof`.`oa`.`oab`.`oabc`],
files=[maprfs:/drill/testdata/complex_type/json/complex.json]]])
02-03 Project(id=[$0], oooi=[$1], $f50=[$2])
02-05 HashToRandomExchange(dist0=[[$2]])
04-01 Project(id=[$2], oooi=[$1],
$f5=[ITEM(ITEM(ITEM($0, 1), 'fl'), 'f1')])
04-02 Scan(groupscan=[EasyGroupScan
[selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1,
columns=[`id`, `oooi`, `ooa`[1].`fl`.`f1`],
files=[maprfs:/drill/testdata/complex_type/json/complex.json]]])
{code}
> Very slow performance for a query involving nested map
> ------------------------------------------------------
>
> Key: DRILL-2290
> URL: https://issues.apache.org/jira/browse/DRILL-2290
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Data Types
> Affects Versions: 0.8.0
> Reporter: Chun Chang
> Assignee: Daniel Barclay (Drill)
>
> #Thu Feb 19 18:40:10 EST 2015
> git.commit.id.abbrev=1ceddff
> This query took 17 minutes to complete. Too long. I think this happened after
> the fix dealing with nested maps.
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select b.id, a.ooa[1].fl.f1,
> b.oooi, a.ooof.oa.oab.oabc from `complex.json` a inner join `complex.json` b
> on a.ooa[1].fl.f1=b.ooa[1].fl.f1 order by b.id limit 20;
> +------------+------------+------------+------------+
> | id | EXPR$1 | oooi | EXPR$3 |
> +------------+------------+------------+------------+
> | 1 | 1.6789 | {"oa":{"oab":{"oabc":1}}} | 1.5678 |
> | 3 | 3.6789 | {"oa":{"oab":{"oabc":3}}} | 3.5678 |
> | 4 | 4.6789 | {"oa":{"oab":{"oabc":4}}} | 4.5678 |
> | 5 | 5.6789 | {"oa":{"oab":{"oabc":5}}} | 5.5678 |
> | 7 | 7.6789 | {"oa":{"oab":{"oabc":7}}} | 7.5678 |
> | 9 | 9.6789 | {"oa":{"oab":{"oabc":9}}} | 9.5678 |
> | 10 | 10.6789 | {"oa":{"oab":{"oabc":10}}} | 10.5678 |
> | 11 | 11.6789 | {"oa":{"oab":{"oabc":11}}} | 11.5678 |
> | 13 | 13.6789 | {"oa":{"oab":{"oabc":13}}} | 13.5678 |
> | 14 | 14.6789 | {"oa":{"oab":{"oabc":14}}} | 14.5678 |
> | 15 | 15.6789 | {"oa":{"oab":{"oabc":15}}} | 15.5678 |
> | 16 | 16.6789 | {"oa":{"oab":{"oabc":16}}} | 16.5678 |
> | 17 | 17.6789 | {"oa":{"oab":{"oabc":17}}} | 17.5678 |
> | 18 | 18.6789 | {"oa":{"oab":{"oabc":18}}} | 18.5678 |
> | 19 | 19.6789 | {"oa":{"oab":{"oabc":19}}} | 19.5678 |
> | 20 | 20.6789 | {"oa":{"oab":{"oabc":20}}} | 20.5678 |
> | 21 | 21.6789 | {"oa":{"oab":{"oabc":21}}} | 21.5678 |
> | 22 | 22.6789 | {"oa":{"oab":{"oabc":22}}} | 22.5678 |
> | 24 | 24.6789 | {"oa":{"oab":{"oabc":24}}} | 24.5678 |
> | 25 | 25.6789 | {"oa":{"oab":{"oabc":25}}} | 25.5678 |
> +------------+------------+------------+------------+
> 20 rows selected (1020.036 seconds)
> {code}
> The query deals just a little less than 1 million records so should not be
> that slow.
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select count(*) from (select
> b.id, a.ooa[1].fl.f1, b.oooi, a.ooof.oa.oab.oabc from `complex.json` a inner
> join `complex.json` b on a.ooa[1].fl.f1=b.ooa[1].fl.f1) c;
> +------------+
> | EXPR$0 |
> +------------+
> | 900190 |
> +------------+
> 1 row selected (700.516 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)