[jira] [Created] (DRILL-1401) join on csv files does not work

Chun Chang (JIRA) Wed, 10 Sep 2014 16:33:19 -0700

Chun Chang created DRILL-1401:
---------------------------------

             Summary: join on csv files does not work
                 Key: DRILL-1401
                 URL: https://issues.apache.org/jira/browse/DRILL-1401
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
    Affects Versions: 0.5.0
            Reporter: Chun Chang



#Wed Sep 10 13:35:05 PDT 2014
git.commit.id.abbrev=686eb9e

join (inner,full outer,left,right) does not work if the join is directly 
applied to csv files. It will produce either null values on the projected 
columns or NumberFormatException depends on joining condition applied on which 
columns (first or second columns). If you create views on the csv files, then 
join works on the views.

For example, the following join give null values:

0: jdbc:drill:schema=dfs> select cast(`aggregate_100r.csv`.columns[0] as int), 
cast(`aggregate_100r.csv`.columns[1] as int), cast(`join_100r.csv`.columns[1] 
as int) from `aggregate_100r.csv` inner join `join_100r.csv` on 
cast(`aggregate_100r.csv`.columns[0] as int) = cast(`join_100r.csv`.columns[0] 
as int);
+------------+------------+------------+
|   EXPR$0   |   EXPR$1   |   EXPR$2   |
+------------+------------+------------+
| 0          | null       | null       |
| 0          | null       | null       |
| 0          | null       | null       |
| 0          | null       | null       |
| 0          | null       | null       |
| 0          | null       | null       |
| 0          | null       | null       |
| 0          | null       | null       |
| 0          | null       | null       |
| 0          | null       | null       |
| 0          | null       | null       |
| 0          | null       | null       |
| 0          | null       | null       |
| 0          | null       | null       |
| 0          | null       | null       |
| 0          | null       | null       |
| 1          | null       | null       |
| 1          | null       | null       |
| 1          | null       | null       |
| 1          | null       | null       |

The following give NumberFormatException:

0: jdbc:drill:schema=dfs> select cast(`aggregate_100r.csv`.columns[0] as int), 
cast(`aggregate_100r.csv`.columns[1] as int), cast(`join_100r.csv`.columns[1] 
as int) from `aggregate_100r.csv` inner join `join_100r.csv` on 
cast(`aggregate_100r.csv`.columns[1] as int) = cast(`join_100r.csv`.columns[1] 
as int);
Query failed: Failure while running fragment.  
[3f67299e-f312-445b-8d6b-74984a820f0c]

Error: exception while executing query: Failure while trying to get next result 
batch. (state=,code=0)

The following with views works:

0: jdbc:drill:schema=dfs> select `aggregate_100r_v`.c0, `aggregate_100r_v`.c1, 
`join_100r_v`.c1 from `aggregate_100r_v` inner join `join_100r_v` on 
`aggregate_100r_v`.c0 = `join_100r_v`.c0;
+------------+------------+------------+
|     c0     |     c1     |    c10     |
+------------+------------+------------+
| 0          | 0          | 0          |
| 0          | 0          | 2          |
| 0          | 0          | 1          |
| 0          | 0          | 0          |
| 0          | 0          | 0          |
| 0          | 0          | 2          |
| 0          | 0          | 1          |
| 0          | 0          | 0          |
| 0          | 1          | 0          |
| 0          | 1          | 2          |
| 0          | 1          | 1          |
| 0          | 1          | 0          |
| 0          | 2          | 0          |
| 0          | 2          | 2          |
| 0          | 2          | 1          |
| 0          | 2          | 0          |
| 1          | 1          | 1          |
| 1          | 1          | 2          |
| 1          | 1          | 1          |
| 1          | 1          | 0          |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-1401) join on csv files does not work

Reply via email to