[jira] [Comment Edited] (DRILL-5451) Query on csv file w/ header fails with an exception when non existing column is requested if file is over 4096 lines long

Matthew Mucker (JIRA) Tue, 03 Oct 2017 09:01:42 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184673#comment-16184673
 ]


Matthew Mucker edited comment on DRILL-5451 at 10/3/17 4:00 PM:
----------------------------------------------------------------

I have encountered a similar (presumably the same) issue querying a Parquet 
file. I lack the skills to know if this should be filed as a separate bug, and 
I am unable to publicly post my file (although I will provide it to the devs 
upon request).

Query that fails:

{{select internalsessionid, 
flatten(playbacksegments['array'])['playbackstarttimestamp'] b  from MyTable 
limit 4097;}}

The query succeeds with a limit of 4096 and fails at a limit of 4097.

Stack trace:

{{  (java.lang.IndexOutOfBoundsException) index: -1, length: 1 (expected: 
range(0, 4096))
    io.netty.buffer.DrillBuf.checkIndexD():123
    io.netty.buffer.DrillBuf.chk():147
    io.netty.buffer.DrillBuf.getByte():794
    org.apache.drill.exec.vector.BitVector.splitAndTransferTo():301
    org.apache.drill.exec.vector.NullableBitVector.splitAndTransferTo():297
    
org.apache.drill.exec.vector.NullableBitVector$TransferImpl.splitAndTransfer():323
    
org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.splitAndTransfer():388
    
org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.splitAndTransfer():232
    
org.apache.drill.exec.vector.complex.RepeatedMapVector$SingleMapTransferPair.splitAndTransfer():293
    org.apache.drill.exec.test.generated.FlattenerGen58.flattenRecords():205
    
org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.handleRemainder():194
    
org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext():117
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.physical.impl.BaseRootExec.next():105
    org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
    org.apache.drill.exec.physical.impl.BaseRootExec.next():95
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1657
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():745 (state=,code=0)}}



was (Author: nevo):
I have encountered a similar (presumably the same) issue querying a Parquet 
file. I lack the skills to know if this should be filed as a separate bug, and 
I am unable to publicly post my file (although I will provide it to the devs 
upon request).

Query that fails:

{{select internalsessionid, 
flatten(playbacksegments['array'])['playbackstarttimestamp'] b  from nrtest 
limit 4097;}}

The query succeeds with a limit of 4096 and fails at a limit of 4097.

Stack trace:

{{  (java.lang.IndexOutOfBoundsException) index: -1, length: 1 (expected: 
range(0, 4096))
    io.netty.buffer.DrillBuf.checkIndexD():123
    io.netty.buffer.DrillBuf.chk():147
    io.netty.buffer.DrillBuf.getByte():794
    org.apache.drill.exec.vector.BitVector.splitAndTransferTo():301
    org.apache.drill.exec.vector.NullableBitVector.splitAndTransferTo():297
    
org.apache.drill.exec.vector.NullableBitVector$TransferImpl.splitAndTransfer():323
    
org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.splitAndTransfer():388
    
org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.splitAndTransfer():232
    
org.apache.drill.exec.vector.complex.RepeatedMapVector$SingleMapTransferPair.splitAndTransfer():293
    org.apache.drill.exec.test.generated.FlattenerGen58.flattenRecords():205
    
org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.handleRemainder():194
    
org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext():117
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.physical.impl.BaseRootExec.next():105
    org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
    org.apache.drill.exec.physical.impl.BaseRootExec.next():95
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1657
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():745 (state=,code=0)}}


> Query on csv file w/ header fails with an exception when non existing column 
> is requested if file is over 4096 lines long
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5451
>                 URL: https://issues.apache.org/jira/browse/DRILL-5451
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 1.10.0
>         Environment: Tested on CentOs 7 and Ubuntu
>            Reporter: Paul Wilson
>            Assignee: Paul Rogers
>         Attachments: 4097_lines.csvh
>
>
> When querying a text (csv) file with extractHeaders set to true, selecting a 
> non existent column works as expected (returns "empty" value) when file has 
> 4096 lines or fewer (1 header plus 4095 data), but results in an 
> IndexOutOfBoundsException where the file has 4097 lines or more.
> With Storage config:
> {code:javascript}
> "csvh": {
>       "type": "text",
>       "extensions": [
>         "csvh"
>       ],
>       "extractHeader": true,
>       "delimiter": ","
>     }
> {code}
> In the following 4096_lines.csvh has is identical to 4097_lines.csvh with the 
> last line removed.
> Results:
> {noformat}
> 0: jdbc:drill:zk=local> select * from dfs.`/test/4097_lines.csvh` LIMIT 2;
> +----------+------------------------+
> | line_no  |    line_description    |
> +----------+------------------------+
> | 2        | this is line number 2  |
> | 3        | this is line number 3  |
> +----------+------------------------+
> 2 rows selected (2.455 seconds)
> 0: jdbc:drill:zk=local> select line_no, non_existent_field from 
> dfs.`/test/4096_lines.csvh` LIMIT 2;
> +----------+---------------------+
> | line_no  | non_existent_field  |
> +----------+---------------------+
> | 2        |                     |
> | 3        |                     |
> +----------+---------------------+
> 2 rows selected (2.248 seconds)
> 0: jdbc:drill:zk=local> select line_no, non_existent_field from 
> dfs.`/test/4097_lines.csvh` LIMIT 2;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: eb0974a8-026d-4048-9f10-ffb821a0d300 on localhost:31010]
>   (java.lang.IndexOutOfBoundsException) index: 16384, length: 4 (expected: 
> range(0, 16384))
>     io.netty.buffer.DrillBuf.checkIndexD():123
>     io.netty.buffer.DrillBuf.chk():147
>     io.netty.buffer.DrillBuf.getInt():520
>     org.apache.drill.exec.vector.UInt4Vector$Accessor.get():358
>     org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount():659
>     org.apache.drill.exec.physical.impl.ScanBatch.next():234
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>     
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1657
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>     java.lang.Thread.run():745 (state=,code=0)
> 0: jdbc:drill:zk=local> 
> {noformat}
> This seems similar to the issue fixed in 
> [DRILL-4108|https://issues.apache.org/jira/browse/DRILL-4108] but it now only 
> manifests for longer files.
> I also see a similar result (i.e. it works for <= 4096 lines, IOBE for >4096 
> lines) for a {noformat} SELECT count(*) ...{noformat} from these files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (DRILL-5451) Query on csv file w/ header fails with an exception when non existing column is requested if file is over 4096 lines long

Reply via email to