[ 
https://issues.apache.org/jira/browse/DRILL-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194268#comment-14194268
 ] 

Tomer Shiran commented on DRILL-1629:
-------------------------------------

Note that this does work when both datasets are coming from DFS:

{code}
0: jdbc:drill:zk=local> SELECT u.user_id, u.name, count(*) reviews FROM 
dfs.yelp.`yelp_academic_dataset_user.json` u, 
dfs.yelp.`yelp_academic_dataset_review.json` r WHERE u.user_id = r.user_id 
GROUP BY u.user_id, u.name ORDER BY reviews DESC LIMIT 10;
+------------+------------+------------+
|  user_id   |    name    |  reviews   |
+------------+------------+------------+
| kGgAARL2UmvCcTRfiscjug | J          | 1399       |
| ikm0UCahtK34LbLCEw4YTw | Rand       | 1137       |
| Iu3Jo9ROp2IWC9FwtWOaUQ | Norm       | 1046       |
| glRXVWWD6x1EZKfjJawTOg | Jade       | 1013       |
| PV5voYSD43Cn_3gHmxG7DA | David      | 895        |
| fczQCSmaWF78toLEmb0Zsw | Gabi       | 872        |
| lHHwLi_YZuDSfdlSShFkug | Nelson     | 816        |
| ia1nTRAQEaFWv0cwADeK7g | Emily      | 811        |
| 3gIfcQq5KxAegwCPXc83cQ | Jennifer   | 790        |
| DrWLhrK8WMZf7Jb-Oqc7ww | Brad       | 765        |
+------------+------------+------------+
10 rows selected (54.847 seconds)
0: jdbc:drill:zk=local> 
{code}

And here's the explain plan:

{code}
0: jdbc:drill:zk=local> EXPLAIN PLAN FOR SELECT u.user_id, u.name, count(*) 
reviews FROM dfs.yelp.`yelp_academic_dataset_user.json` u, 
dfs.yelp.`yelp_academic_dataset_review.json` r WHERE u.user_id = r.user_id 
GROUP BY u.user_id, u.name ORDER BY reviews DESC LIMIT 10;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      Project(user_id=[$0], name=[$1], reviews=[$2])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[10])
00-04            SingleMergeExchange(sort0=[2 DESC])
01-01              SelectionVectorRemover
01-02                TopN(limit=[10])
01-03                  HashToRandomExchange(dist0=[[$2]])
02-01                    HashAgg(group=[{0, 1}], reviews=[COUNT()])
02-02                      Project(user_id=[$0], name=[$1])
02-03                        HashJoin(condition=[=($0, $2)], joinType=[inner])
02-05                          HashToRandomExchange(dist0=[[$0]])
03-01                            Scan(groupscan=[EasyGroupScan 
[selectionRoot=/Users/tshiran/Development/yelp/yelp_academic_dataset_user.json, 
numFiles=1, columns = [SchemaPath [`user_id`], SchemaPath [`name`]]]])
02-04                          Project(user_id0=[$0])
02-06                            HashToRandomExchange(dist0=[[$0]])
04-01                              Scan(groupscan=[Ea |
+------------+------------+
1 row selected (0.415 seconds)
{code}

> Cast issue during query
> -----------------------
>
>                 Key: DRILL-1629
>                 URL: https://issues.apache.org/jira/browse/DRILL-1629
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Tomer Shiran
>
> Imported Yelp user.json file into a local MongoDB instance (via mongoimport). 
> Ran the following query to find the names of the users who had the most 
> reviews:
> {code}
> 0: jdbc:drill:zk=local> SELECT u.user_id, u.name, count(*) reviews FROM 
> mongo.yelp.users u, dfs.yelp.`yelp_academic_dataset_review.json` r WHERE 
> u.user_id = r.user_id GROUP BY u.user_id, u.name ORDER BY reviews DESC LIMIT 
> 10;
> Query failed: Failure while running fragment. Failure finding function that 
> runtime code generation expected.  Signature: compare_to( 
> VARCHAR:OPTIONALINT:OPTIONAL,  ) returns INT:REQUIRED 
> [c796f1fb-b73a-49e8-8b4a-ed973d89b7d3]
> Error: exception while executing query: Failure while trying to get next 
> result batch. (state=,code=0)
> {code}
> Here's the explain plan:
> {code}
> 0: jdbc:drill:zk=local> EXPLAIN PLAN FOR SELECT u.user_id, u.name, count(*) 
> reviews FROM mongo.yelp.users u, dfs.yelp.`yelp_academic_dataset_review.json` 
> r WHERE u.user_id = r.user_id GROUP BY u.user_id, u.name ORDER BY reviews 
> DESC LIMIT 10;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(user_id=[$0], name=[$1], reviews=[$2])
> 00-02        SelectionVectorRemover
> 00-03          Limit(fetch=[10])
> 00-04            SingleMergeExchange(sort0=[2 DESC])
> 01-01              SelectionVectorRemover
> 01-02                TopN(limit=[10])
> 01-03                  HashToRandomExchange(dist0=[[$2]])
> 02-01                    HashAgg(group=[{0, 1}], reviews=[COUNT()])
> 02-02                      Project(user_id=[$0], name=[$1])
> 02-03                        HashJoin(condition=[=($0, $2)], joinType=[inner])
> 02-05                          HashToRandomExchange(dist0=[[$0]])
> 03-01                            Scan(groupscan=[MongoGroupScan 
> [MongoScanSpec=MongoScanSpec [dbName=yelp, collectionName=users, 
> filters=null], columns=[SchemaPath [`user_id`], SchemaPath [`name`]]]])
> 02-04                          HashToRandomExchange(dist0=[[$0]])
> 04-01                            Project(T8¦¦*=[$0])
> 04-02                              Project(T8¦¦*=[$0], T8¦¦user_id=[$ |
> +------------+------------+
> 1 row selected (0.42 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to