[
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035589#comment-15035589
]
Steven Phillips commented on DRILL-4145:
----------------------------------------
There is a bug in the case where there is an empty string for the last field.
Basically, when the parser sees the pattern <field delimiter><line delimiter>,
the parser calls the "endEmptyField()" method of the TextInput. This was ok
when using the RepeatedVarCharInput, because calling this method resulted in an
empty string element being added to the array. But in the FieldVarCharOutput,
ending the field doesn't do anything unless you first start the field.
> IndexOutOfBoundsException raised during select * query on S3 csv file
> ---------------------------------------------------------------------
>
> Key: DRILL-4145
> URL: https://issues.apache.org/jira/browse/DRILL-4145
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 1.3.0
> Environment: Drill 1.3.0 on a 3 node distributed-mode cluster on AWS.
> Data files on S3.
> S3 storage plugin configuration:
> {
> "type": "file",
> "enabled": true,
> "connection": "s3a://<bucket-name-was-here>",
> "workspaces": {
> "root": {
> "location": "/",
> "writable": false,
> "defaultInputFormat": null
> },
> "views": {
> "location": "/processed",
> "writable": true,
> "defaultInputFormat": null
> },
> "tmp": {
> "location": "/tmp",
> "writable": true,
> "defaultInputFormat": null
> }
> },
> "formats": {
> "psv": {
> "type": "text",
> "extensions": [
> "tbl"
> ],
> "delimiter": "|"
> },
> "csv": {
> "type": "text",
> "extensions": [
> "csv"
> ],
> "extractHeader": true,
> "delimiter": ","
> },
> "tsv": {
> "type": "text",
> "extensions": [
> "tsv"
> ],
> "delimiter": "\t"
> },
> "parquet": {
> "type": "parquet"
> },
> "json": {
> "type": "json"
> },
> "avro": {
> "type": "avro"
> },
> "sequencefile": {
> "type": "sequencefile",
> "extensions": [
> "seq"
> ]
> },
> "csvh": {
> "type": "text",
> "extensions": [
> "csvh",
> "csv"
> ],
> "extractHeader": true,
> "delimiter": ","
> }
> }
> }
> Reporter: Peter McTaggart
> Attachments: apps1-bad.csv, apps1.csv
>
>
> When trying to query (via sqlline or WebUI) a .csv file I am getting an
> IndexOutofBoundsException:
> {noformat} 0: jdbc:drill:> select * from
> s3data.root.`staging/data/apps1-bad.csv` limit 1;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on
> ip-XXXXX.compute.internal:31010] (state=,code=0)
> 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
> +----------+----------------------+----------+----------+----------+------------+----------+------------+----------+--------------+-----------+----------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
> | FIELD_1 | FIELD_2 | FIELD_3 | FIELD_4 | FIELD_5 | FIELD_6
> | FIELD_7 | FIELD_8 | FIELD_9 | FIELD_10 | FIELD_11 |
> FIELD_12 | FIELD_13 | FIELD_14 | FIELD_15 | FIELD_16 | FIELD_17 |
> FIELD_18 | FIELD_19 | FIELD_20 | FIELD_21 | FIELD_22 |
> FIELD_23 | FIELD_24 | FIELD_25 | FIELD_26 | FIELD_27 | FIELD_28 |
> FIELD_29 | FIELD_30 | FIELD_31 | FIELD_32 | FIELD_33 | FIELD_34 |
> FIELD_35 |
> +----------+----------------------+----------+----------+----------+------------+----------+------------+----------+--------------+-----------+----------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
> | 489517 | 27/10/2015 02:05:27 | 261 | 1130232 | 0 |
> 925630488 | 0 | 925630488 | -1 | 19531580547 | 00000000 |
> 27/10/2015 02:00:00 | | 30 | 300 | 0 | 0
> | 00000000 | 00000000 | 27/10/2015 02:05:27 | 0 | 1 | 0
> | 35.0 | | | | 505 | 872.0
> | | aBc | | | | |
> +----------+----------------------+----------+----------+----------+------------+----------+------------+----------+--------------+-----------+----------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
> 1 row selected (1.094 seconds)
> 0: jdbc:drill:> {noformat}
> Good file: apps1.csv, and
> Bad file: apps1-bad.csv attached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)