[
https://issues.apache.org/jira/browse/DRILL-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184673#comment-16184673
]
Matthew Mucker edited comment on DRILL-5451 at 10/3/17 4:00 PM:
----------------------------------------------------------------
I have encountered a similar (presumably the same) issue querying a Parquet
file. I lack the skills to know if this should be filed as a separate bug, and
I am unable to publicly post my file (although I will provide it to the devs
upon request).
Query that fails:
{{select internalsessionid,
flatten(playbacksegments['array'])['playbackstarttimestamp'] b from MyTable
limit 4097;}}
The query succeeds with a limit of 4096 and fails at a limit of 4097.
Stack trace:
{{ (java.lang.IndexOutOfBoundsException) index: -1, length: 1 (expected:
range(0, 4096))
io.netty.buffer.DrillBuf.checkIndexD():123
io.netty.buffer.DrillBuf.chk():147
io.netty.buffer.DrillBuf.getByte():794
org.apache.drill.exec.vector.BitVector.splitAndTransferTo():301
org.apache.drill.exec.vector.NullableBitVector.splitAndTransferTo():297
org.apache.drill.exec.vector.NullableBitVector$TransferImpl.splitAndTransfer():323
org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.splitAndTransfer():388
org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.splitAndTransfer():232
org.apache.drill.exec.vector.complex.RepeatedMapVector$SingleMapTransferPair.splitAndTransfer():293
org.apache.drill.exec.test.generated.FlattenerGen58.flattenRecords():205
org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.handleRemainder():194
org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext():117
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.physical.impl.BaseRootExec.next():105
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
org.apache.drill.exec.physical.impl.BaseRootExec.next():95
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1657
org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745 (state=,code=0)}}
was (Author: nevo):
I have encountered a similar (presumably the same) issue querying a Parquet
file. I lack the skills to know if this should be filed as a separate bug, and
I am unable to publicly post my file (although I will provide it to the devs
upon request).
Query that fails:
{{select internalsessionid,
flatten(playbacksegments['array'])['playbackstarttimestamp'] b from nrtest
limit 4097;}}
The query succeeds with a limit of 4096 and fails at a limit of 4097.
Stack trace:
{{ (java.lang.IndexOutOfBoundsException) index: -1, length: 1 (expected:
range(0, 4096))
io.netty.buffer.DrillBuf.checkIndexD():123
io.netty.buffer.DrillBuf.chk():147
io.netty.buffer.DrillBuf.getByte():794
org.apache.drill.exec.vector.BitVector.splitAndTransferTo():301
org.apache.drill.exec.vector.NullableBitVector.splitAndTransferTo():297
org.apache.drill.exec.vector.NullableBitVector$TransferImpl.splitAndTransfer():323
org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.splitAndTransfer():388
org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.splitAndTransfer():232
org.apache.drill.exec.vector.complex.RepeatedMapVector$SingleMapTransferPair.splitAndTransfer():293
org.apache.drill.exec.test.generated.FlattenerGen58.flattenRecords():205
org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.handleRemainder():194
org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext():117
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.physical.impl.BaseRootExec.next():105
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
org.apache.drill.exec.physical.impl.BaseRootExec.next():95
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1657
org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745 (state=,code=0)}}
> Query on csv file w/ header fails with an exception when non existing column
> is requested if file is over 4096 lines long
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-5451
> URL: https://issues.apache.org/jira/browse/DRILL-5451
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 1.10.0
> Environment: Tested on CentOs 7 and Ubuntu
> Reporter: Paul Wilson
> Assignee: Paul Rogers
> Attachments: 4097_lines.csvh
>
>
> When querying a text (csv) file with extractHeaders set to true, selecting a
> non existent column works as expected (returns "empty" value) when file has
> 4096 lines or fewer (1 header plus 4095 data), but results in an
> IndexOutOfBoundsException where the file has 4097 lines or more.
> With Storage config:
> {code:javascript}
> "csvh": {
> "type": "text",
> "extensions": [
> "csvh"
> ],
> "extractHeader": true,
> "delimiter": ","
> }
> {code}
> In the following 4096_lines.csvh has is identical to 4097_lines.csvh with the
> last line removed.
> Results:
> {noformat}
> 0: jdbc:drill:zk=local> select * from dfs.`/test/4097_lines.csvh` LIMIT 2;
> +----------+------------------------+
> | line_no | line_description |
> +----------+------------------------+
> | 2 | this is line number 2 |
> | 3 | this is line number 3 |
> +----------+------------------------+
> 2 rows selected (2.455 seconds)
> 0: jdbc:drill:zk=local> select line_no, non_existent_field from
> dfs.`/test/4096_lines.csvh` LIMIT 2;
> +----------+---------------------+
> | line_no | non_existent_field |
> +----------+---------------------+
> | 2 | |
> | 3 | |
> +----------+---------------------+
> 2 rows selected (2.248 seconds)
> 0: jdbc:drill:zk=local> select line_no, non_existent_field from
> dfs.`/test/4097_lines.csvh` LIMIT 2;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: eb0974a8-026d-4048-9f10-ffb821a0d300 on localhost:31010]
> (java.lang.IndexOutOfBoundsException) index: 16384, length: 4 (expected:
> range(0, 16384))
> io.netty.buffer.DrillBuf.checkIndexD():123
> io.netty.buffer.DrillBuf.chk():147
> io.netty.buffer.DrillBuf.getInt():520
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get():358
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount():659
> org.apache.drill.exec.physical.impl.ScanBatch.next():234
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)
> 0: jdbc:drill:zk=local>
> {noformat}
> This seems similar to the issue fixed in
> [DRILL-4108|https://issues.apache.org/jira/browse/DRILL-4108] but it now only
> manifests for longer files.
> I also see a similar result (i.e. it works for <= 4096 lines, IOBE for >4096
> lines) for a {noformat} SELECT count(*) ...{noformat} from these files.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)