[
https://issues.apache.org/jira/browse/DRILL-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099245#comment-14099245
]
Jason Altekruse commented on DRILL-922:
---------------------------------------
I cannot pinpoint the commit that solved the issue, but this query is now
returning correctly.
sqlline version 1.1.6
0: jdbc:drill:zk=local> select sum(cast(columns[0] as bigint)) from
dfs.`/tmp/jira.csv`;
+------------+
| EXPR$0 |
+------------+
| 36 |
+------------+
1 row selected (1.62 seconds)
0: jdbc:drill:zk=local> select sum(cast(columns[0] as bigint)) from
dfs.`/tmp/jira.csv` group by columns[1];
+------------+
| EXPR$0 |
+------------+
| 8 |
| 2 |
| 11 |
| 4 |
| 5 |
| 6 |
+------------+
6 rows selected (0.679 seconds)
0: jdbc:drill:zk=local> select sum(cast(columns[0] as bigint)), columns[1] from
dfs.`/tmp/jira.csv` group by columns[1];
+------------+------------+
| EXPR$0 | EXPR$1 |
+------------+------------+
| 8 | a |
| 2 | b |
| 11 | ab |
| 4 | null |
| 5 | abc |
| 6 | c |
+------------+------------+
6 rows selected (0.561 seconds)
0: jdbc:drill:zk=local>
> group by fails with csv file
> ----------------------------
>
> Key: DRILL-922
> URL: https://issues.apache.org/jira/browse/DRILL-922
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Operators
> Reporter: Chun Chang
> Assignee: Jason Altekruse
> Fix For: 0.5.0
>
>
> #Fri Jun 06 10:06:50 PDT 2014
> git.commit.id.abbrev=3db1d5a
> Group by fails with csv type of data. It works with parquet. For example, I
> have the following csv data:
> [root@qa-node120 ~]# cat jira.csv
> 1,a
> 2,b
> 3,ab
> 4,
> 5,abc
> 6,c
> 7,a
> 8,ab
> The following query without group by works:
> 0: jdbc:drill:schema=dfs> select sum(cast(columns[0] as bigint)) from
> `jira.csv`;
> +------------+
> | EXPR$0 |
> +------------+
> | 36 |
> +------------+
> But if I add group by, it fails:
> 0: jdbc:drill:schema=dfs> select sum(cast(columns[0] as bigint)) from
> `jira.csv` group by columns[1];
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while
> running query.[error_id: "f1ffdac3-f454-4ba9-95db-374658db3654"
> endpoint {
> address: "qa-node117.qa.lab"
> user_port: 31010
> control_port: 31011
> data_port: 31012
> }
> error_type: 0
> message: "Failure while running fragment. < NumberFormatException:[ ]"
> ]
> Error: exception while executing query (state=,code=0)
> But if I add a row limit, then it works:
> 0: jdbc:drill:schema=dfs> select columns[1], sum(cast(columns[0] as bigint))
> from `jira.csv` where columns[0] <= 8 group by columns[1];
> +------------+------------+
> | EXPR$0 | EXPR$1 |
> +------------+------------+
> | b | 2 |
> | c | 6 |
> | a | 8 |
> | | 4 |
> | ab | 11 |
> | abc | 5 |
> +------------+------------+
> It seems to me that group by scanner does not know where the end of the
> column is.
--
This message was sent by Atlassian JIRA
(v6.2#6252)