[jira] [Commented] (DRILL-5451) Query on csv file w/ header fails with an exception when non existing column is requested if file is over 4096 lines long

Arina Ielchiieva (Jira) Fri, 11 Oct 2019 03:36:48 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949355#comment-16949355
 ]


Arina Ielchiieva commented on DRILL-5451:
-----------------------------------------

For the CSV case, issue is fixed with introduction of new V3 text reader which 
is the default text reader since 1.17.0.

> Query on csv file w/ header fails with an exception when non existing column 
> is requested if file is over 4096 lines long
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5451
>                 URL: https://issues.apache.org/jira/browse/DRILL-5451
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text &amp; CSV
>    Affects Versions: 1.10.0
>         Environment: Tested on CentOs 7 and Ubuntu
>            Reporter: Paul Wilson
>            Assignee: Paul Rogers
>            Priority: Major
>         Attachments: 4097_lines.csvh
>
>
> When querying a text (csv) file with extractHeaders set to true, selecting a 
> non existent column works as expected (returns "empty" value) when file has 
> 4096 lines or fewer (1 header plus 4095 data), but results in an 
> IndexOutOfBoundsException where the file has 4097 lines or more.
> With Storage config:
> {code:javascript}
> "csvh": {
>       "type": "text",
>       "extensions": [
>         "csvh"
>       ],
>       "extractHeader": true,
>       "delimiter": ","
>     }
> {code}
> In the following 4096_lines.csvh has is identical to 4097_lines.csvh with the 
> last line removed.
> Results:
> {noformat}
> 0: jdbc:drill:zk=local> select * from dfs.`/test/4097_lines.csvh` LIMIT 2;
> +----------+------------------------+
> | line_no  |    line_description    |
> +----------+------------------------+
> | 2        | this is line number 2  |
> | 3        | this is line number 3  |
> +----------+------------------------+
> 2 rows selected (2.455 seconds)
> 0: jdbc:drill:zk=local> select line_no, non_existent_field from 
> dfs.`/test/4096_lines.csvh` LIMIT 2;
> +----------+---------------------+
> | line_no  | non_existent_field  |
> +----------+---------------------+
> | 2        |                     |
> | 3        |                     |
> +----------+---------------------+
> 2 rows selected (2.248 seconds)
> 0: jdbc:drill:zk=local> select line_no, non_existent_field from 
> dfs.`/test/4097_lines.csvh` LIMIT 2;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: eb0974a8-026d-4048-9f10-ffb821a0d300 on localhost:31010]
>   (java.lang.IndexOutOfBoundsException) index: 16384, length: 4 (expected: 
> range(0, 16384))
>     io.netty.buffer.DrillBuf.checkIndexD():123
>     io.netty.buffer.DrillBuf.chk():147
>     io.netty.buffer.DrillBuf.getInt():520
>     org.apache.drill.exec.vector.UInt4Vector$Accessor.get():358
>     org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount():659
>     org.apache.drill.exec.physical.impl.ScanBatch.next():234
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>     
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1657
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>     java.lang.Thread.run():745 (state=,code=0)
> 0: jdbc:drill:zk=local> 
> {noformat}
> This seems similar to the issue fixed in 
> [DRILL-4108|https://issues.apache.org/jira/browse/DRILL-4108] but it now only 
> manifests for longer files.
> I also see a similar result (i.e. it works for <= 4096 lines, IOBE for >4096 
> lines) for a {noformat} SELECT count(*) ...{noformat} from these files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (DRILL-5451) Query on csv file w/ header fails with an exception when non existing column is requested if file is over 4096 lines long

Reply via email to