[ https://issues.apache.org/jira/browse/DRILL-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacques Nadeau updated DRILL-922: --------------------------------- Assignee: Jason Altekruse (was: Aditya Kishore) > group by fails with csv file > ---------------------------- > > Key: DRILL-922 > URL: https://issues.apache.org/jira/browse/DRILL-922 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Operators > Reporter: Chun Chang > Assignee: Jason Altekruse > Fix For: 0.5.0 > > > #Fri Jun 06 10:06:50 PDT 2014 > git.commit.id.abbrev=3db1d5a > Group by fails with csv type of data. It works with parquet. For example, I > have the following csv data: > [root@qa-node120 ~]# cat jira.csv > 1,a > 2,b > 3,ab > 4, > 5,abc > 6,c > 7,a > 8,ab > The following query without group by works: > 0: jdbc:drill:schema=dfs> select sum(cast(columns[0] as bigint)) from > `jira.csv`; > +------------+ > | EXPR$0 | > +------------+ > | 36 | > +------------+ > But if I add group by, it fails: > 0: jdbc:drill:schema=dfs> select sum(cast(columns[0] as bigint)) from > `jira.csv` group by columns[1]; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "f1ffdac3-f454-4ba9-95db-374658db3654" > endpoint { > address: "qa-node117.qa.lab" > user_port: 31010 > control_port: 31011 > data_port: 31012 > } > error_type: 0 > message: "Failure while running fragment. < NumberFormatException:[ ]" > ] > Error: exception while executing query (state=,code=0) > But if I add a row limit, then it works: > 0: jdbc:drill:schema=dfs> select columns[1], sum(cast(columns[0] as bigint)) > from `jira.csv` where columns[0] <= 8 group by columns[1]; > +------------+------------+ > | EXPR$0 | EXPR$1 | > +------------+------------+ > | b | 2 | > | c | 6 | > | a | 8 | > | | 4 | > | ab | 11 | > | abc | 5 | > +------------+------------+ > It seems to me that group by scanner does not know where the end of the > column is. -- This message was sent by Atlassian JIRA (v6.2#6252)