[ 
https://issues.apache.org/jira/browse/DRILL-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031724#comment-16031724
 ] 

Paul Rogers commented on DRILL-5557:
------------------------------------

I've seen something similar, maybe my experience can help pin down this problem.

DRILL-5470 describes a user experience with bizarre string lengths when reading 
CSV data, probably due to vector corruption.

DRILL-5487 describes a case where a truncated last row in a CSV file leads to 
vector corruption. In that case, just one row was missing and we got some 
strange behavior. If more rows are missing, it might mean we get the error seen 
here.

Drill has the ability to "back-fill" values when reading files, such as JSON, 
that may omit columns in some records. The back-filling works only for some 
types. Back-filling is *not* done at the end of a batch. This may be the cause 
of the issue here.

Normally, CSV files have the same columns in every row. I wonder, in your data 
file, do you have "missing" columns in the end of the file:

{code}
a, b, c
10, 20, 30
11
12
{code}

Or, do you have one file, with, say, three columns and some other files with 
only two? (That is, does the number of columns change from file to file?)

Would be good to finally nail down this issue...

> java.lang.IndexOutOfBoundsException: writerIndex: 
> --------------------------------------------------
>
>                 Key: DRILL-5557
>                 URL: https://issues.apache.org/jira/browse/DRILL-5557
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: renlu
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to