[ https://issues.apache.org/jira/browse/DRILL-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949355#comment-16949355 ]
Arina Ielchiieva commented on DRILL-5451: ----------------------------------------- For the CSV case, issue is fixed with introduction of new V3 text reader which is the default text reader since 1.17.0. > Query on csv file w/ header fails with an exception when non existing column > is requested if file is over 4096 lines long > ------------------------------------------------------------------------------------------------------------------------- > > Key: DRILL-5451 > URL: https://issues.apache.org/jira/browse/DRILL-5451 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text & CSV > Affects Versions: 1.10.0 > Environment: Tested on CentOs 7 and Ubuntu > Reporter: Paul Wilson > Assignee: Paul Rogers > Priority: Major > Attachments: 4097_lines.csvh > > > When querying a text (csv) file with extractHeaders set to true, selecting a > non existent column works as expected (returns "empty" value) when file has > 4096 lines or fewer (1 header plus 4095 data), but results in an > IndexOutOfBoundsException where the file has 4097 lines or more. > With Storage config: > {code:javascript} > "csvh": { > "type": "text", > "extensions": [ > "csvh" > ], > "extractHeader": true, > "delimiter": "," > } > {code} > In the following 4096_lines.csvh has is identical to 4097_lines.csvh with the > last line removed. > Results: > {noformat} > 0: jdbc:drill:zk=local> select * from dfs.`/test/4097_lines.csvh` LIMIT 2; > +----------+------------------------+ > | line_no | line_description | > +----------+------------------------+ > | 2 | this is line number 2 | > | 3 | this is line number 3 | > +----------+------------------------+ > 2 rows selected (2.455 seconds) > 0: jdbc:drill:zk=local> select line_no, non_existent_field from > dfs.`/test/4096_lines.csvh` LIMIT 2; > +----------+---------------------+ > | line_no | non_existent_field | > +----------+---------------------+ > | 2 | | > | 3 | | > +----------+---------------------+ > 2 rows selected (2.248 seconds) > 0: jdbc:drill:zk=local> select line_no, non_existent_field from > dfs.`/test/4097_lines.csvh` LIMIT 2; > Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 > (expected: range(0, 16384)) > Fragment 0:0 > [Error Id: eb0974a8-026d-4048-9f10-ffb821a0d300 on localhost:31010] > (java.lang.IndexOutOfBoundsException) index: 16384, length: 4 (expected: > range(0, 16384)) > io.netty.buffer.DrillBuf.checkIndexD():123 > io.netty.buffer.DrillBuf.chk():147 > io.netty.buffer.DrillBuf.getInt():520 > org.apache.drill.exec.vector.UInt4Vector$Accessor.get():358 > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount():659 > org.apache.drill.exec.physical.impl.ScanBatch.next():234 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1657 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():226 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) > 0: jdbc:drill:zk=local> > {noformat} > This seems similar to the issue fixed in > [DRILL-4108|https://issues.apache.org/jira/browse/DRILL-4108] but it now only > manifests for longer files. > I also see a similar result (i.e. it works for <= 4096 lines, IOBE for >4096 > lines) for a {noformat} SELECT count(*) ...{noformat} from these files. -- This message was sent by Atlassian Jira (v8.3.4#803005)