[jira] [Commented] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

John Omernik (JIRA) Tue, 01 Dec 2015 05:59:31 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033713#comment-15033713
 ]


John Omernik commented on DRILL-4145:
-------------------------------------

I just tested the apps1-bad.csv on a MapRFS based DFS plugin.  (Perhaps we can 
focus on S3 here).  Basically, when I ran  the same query as you as I had no 
issues at all.  I am running the Developer release (Based on the 1.3 release 
from Apache) of MapR Drill, thus, other then some additions for MapR Tables, we 
should be same on code base (if you are running 1.3).  

This is interesting to me though, because when I ran the query, instead of 
interpreting the fields as your setup did, mine returned one field of  
"columns" with an array. Thus my "limit 1" query data started out like this:

| ["FIELD_1","FIELD_2","FIELD_3","....

I.e. in your query, it parsed the header field into fields, in mine it returned 
the all as an array. The reason I bring this up, is I am curious on the 
differences in our setup. If we are both running 1.3, it should return the same 
right?  Can you share the formats section of your s3 plugin? I tried to use 
"extractHeader": true on mine, but got the same result, I am curious on your 
configuration there.  

I want to get it so we can either hone in the S3 difference, and eliminate 
configuration or version differences. 

Additionally, can you do select * from sys.version and share the commit_time 
and build_time on yours?  That may be helpful as well for me.  I have a commit 
time of 20.11.2015 & 01:34:54 UTC and a build time of 21.11.2015 @ 05:21:04 
UTC.   Are you using the official release or are you using a snapshot from 
Github?

Thanks!


> IndexOutOfBoundsException raised during select * query on S3 csv file
> ---------------------------------------------------------------------
>
>                 Key: DRILL-4145
>                 URL: https://issues.apache.org/jira/browse/DRILL-4145
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.3.0
>         Environment: Drill 1.3.0 on a 3 node distributed-mode cluster on AWS.
> Data files on S3.
> S3 storage plugin configuration:
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3a://<bucket-name-was-here>",
>   "workspaces": {
>     "root": {
>       "location": "/",
>       "writable": false,
>       "defaultInputFormat": null
>     },
>     "views": {
>       "location": "/processed",
>       "writable": true,
>       "defaultInputFormat": null
>     },
>     "tmp": {
>       "location": "/tmp",
>       "writable": true,
>       "defaultInputFormat": null
>     }
>   },
>   "formats": {
>     "psv": {
>       "type": "text",
>       "extensions": [
>         "tbl"
>       ],
>       "delimiter": "|"
>     },
>     "csv": {
>       "type": "text",
>       "extensions": [
>         "csv"
>       ],
>       "extractHeader": true,
>       "delimiter": ","
>     },
>     "tsv": {
>       "type": "text",
>       "extensions": [
>         "tsv"
>       ],
>       "delimiter": "\t"
>     },
>     "parquet": {
>       "type": "parquet"
>     },
>     "json": {
>       "type": "json"
>     },
>     "avro": {
>       "type": "avro"
>     },
>     "sequencefile": {
>       "type": "sequencefile",
>       "extensions": [
>         "seq"
>       ]
>     },
>     "csvh": {
>       "type": "text",
>       "extensions": [
>         "csvh",
>         "csv"
>       ],
>       "extractHeader": true,
>       "delimiter": ","
>     }
>   }
> }
>            Reporter: Peter McTaggart
>         Attachments: apps1-bad.csv, apps1.csv
>
>
> When trying to query (via sqlline or WebUI) a .csv file I am getting an 
> IndexOutofBoundsException:
> {noformat} 0: jdbc:drill:> select * from 
> s3data.root.`staging/data/apps1-bad.csv` limit 1;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
> ip-XXXXX.compute.internal:31010] (state=,code=0)
> 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
> +----------+----------------------+----------+----------+----------+------------+----------+------------+----------+--------------+-----------+----------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
> | FIELD_1  |       FIELD_2        | FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6 
>   | FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |       
> FIELD_12       | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | 
> FIELD_18  | FIELD_19  |       FIELD_20       | FIELD_21  | FIELD_22  | 
> FIELD_23  | FIELD_24  | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | 
> FIELD_29  | FIELD_30  | FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | 
> FIELD_35  |
> +----------+----------------------+----------+----------+----------+------------+----------+------------+----------+--------------+-----------+----------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
> | 489517   | 27/10/2015 02:05:27  | 261      | 1130232  | 0        | 
> 925630488  | 0        | 925630488  | -1       | 19531580547  | 00000000  | 
> 27/10/2015 02:00:00  |           | 30        | 300       | 0         | 0      
>    | 00000000  | 00000000  | 27/10/2015 02:05:27  | 0         | 1         | 0 
>         | 35.0      |           |           |           | 505       | 872.0   
>   |           | aBc       |           |           |           |           |
> +----------+----------------------+----------+----------+----------+------------+----------+------------+----------+--------------+-----------+----------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
> 1 row selected (1.094 seconds)
> 0: jdbc:drill:>  {noformat}
> Good file: apps1.csv, and 
> Bad file: apps1-bad.csv  attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

Reply via email to