[
https://issues.apache.org/jira/browse/DRILL-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arina Ielchiieva resolved DRILL-5451.
-------------------------------------
Resolution: Fixed
> Query on csv file w/ header fails with an exception when non existing column
> is requested if file is over 4096 lines long
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-5451
> URL: https://issues.apache.org/jira/browse/DRILL-5451
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 1.10.0
> Environment: Tested on CentOs 7 and Ubuntu
> Reporter: Paul Wilson
> Assignee: Paul Rogers
> Priority: Major
> Fix For: 1.17.0
>
> Attachments: 4097_lines.csvh
>
>
> When querying a text (csv) file with extractHeaders set to true, selecting a
> non existent column works as expected (returns "empty" value) when file has
> 4096 lines or fewer (1 header plus 4095 data), but results in an
> IndexOutOfBoundsException where the file has 4097 lines or more.
> With Storage config:
> {code:javascript}
> "csvh": {
> "type": "text",
> "extensions": [
> "csvh"
> ],
> "extractHeader": true,
> "delimiter": ","
> }
> {code}
> In the following 4096_lines.csvh has is identical to 4097_lines.csvh with the
> last line removed.
> Results:
> {noformat}
> 0: jdbc:drill:zk=local> select * from dfs.`/test/4097_lines.csvh` LIMIT 2;
> +----------+------------------------+
> | line_no | line_description |
> +----------+------------------------+
> | 2 | this is line number 2 |
> | 3 | this is line number 3 |
> +----------+------------------------+
> 2 rows selected (2.455 seconds)
> 0: jdbc:drill:zk=local> select line_no, non_existent_field from
> dfs.`/test/4096_lines.csvh` LIMIT 2;
> +----------+---------------------+
> | line_no | non_existent_field |
> +----------+---------------------+
> | 2 | |
> | 3 | |
> +----------+---------------------+
> 2 rows selected (2.248 seconds)
> 0: jdbc:drill:zk=local> select line_no, non_existent_field from
> dfs.`/test/4097_lines.csvh` LIMIT 2;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: eb0974a8-026d-4048-9f10-ffb821a0d300 on localhost:31010]
> (java.lang.IndexOutOfBoundsException) index: 16384, length: 4 (expected:
> range(0, 16384))
> io.netty.buffer.DrillBuf.checkIndexD():123
> io.netty.buffer.DrillBuf.chk():147
> io.netty.buffer.DrillBuf.getInt():520
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get():358
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount():659
> org.apache.drill.exec.physical.impl.ScanBatch.next():234
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)
> 0: jdbc:drill:zk=local>
> {noformat}
> This seems similar to the issue fixed in
> [DRILL-4108|https://issues.apache.org/jira/browse/DRILL-4108] but it now only
> manifests for longer files.
> I also see a similar result (i.e. it works for <= 4096 lines, IOBE for >4096
> lines) for a {noformat} SELECT count(*) ...{noformat} from these files.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)