[ 
https://issues.apache.org/jira/browse/DRILL-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194878#comment-15194878
 ] 

Deneche A. Hakim commented on DRILL-4317:
-----------------------------------------

I was finally able to reproduce the issue on my laptop. It's data related and 
is exposed after the file is split during execution. The issue doesn't show up 
if you run the query on the local filesystem (as the file is not split) or in 
MapR FS (as the block size is different and the file end up being too small to 
be split).

Will share more informations once I figure out what's the root cause of this 
issue

> Exceptions on SELECT and CTAS with large CSV files
> --------------------------------------------------
>
>                 Key: DRILL-4317
>                 URL: https://issues.apache.org/jira/browse/DRILL-4317
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 1.4.0, 1.5.0
>         Environment: 4 node cluster, Hadoop 2.7.0, 14.04.1-Ubuntu
>            Reporter: Matt Keranen
>            Assignee: Deneche A. Hakim
>
> Selecting from a CSV file or running a CTAS into Parquet generates exceptions.
> Source file is ~650MB, a table of 4 key columns followed by 39 numeric data 
> columns, otherwise a fairly simple format. Example:
> {noformat}
> 2015-10-17 
> 00:00,f5e9v8u2,err,fr7,226020793,76.094,26307,226020793,76.094,26307,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
> 2015-10-17 
> 00:00,c3f9x5z2,err,mi1,1339159295,216.004,177690,1339159295,216.004,177690,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
> 2015-10-17 
> 00:00,r5z2f2i9,err,mi1,7159994629,39718.011,65793,6142021303,30687.811,64630,143777403,40.521,146,75503742,41.905,89,170771174,168.165,198,192565529,370.475,222,97577280,318.068,120,62631452,288.253,68,32371173,189.527,39,41712265,299.184,46,39046408,363.418,47,34182318,465.343,43,127834582,6485.341,145
> 2015-10-17 
> 00:00,j9s6i8t2,err,fr7,20580443899,277445.055,67826,2814893469,85447.816,54275,2584757097,608.001,2044,1395571268,769.113,1051,3070616988,3000.005,2284,3413811671,6489.060,2569,1772235156,5806.214,1339,1097879284,5064.120,858,691884865,4035.397,511,672967845,4815.875,518,789163614,7306.684,599,813910495,10632.464,627,1462752147,143470.306,1151
> {noformat}
> A "SELECT from `/path/to/file.csv`" runs for 10's of minutes and eventually 
> results in:
> {noformat}
> java.lang.IndexOutOfBoundsException: index: 547681, length: 1 (expected: 
> range(0, 547681))
>         at 
> io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1134)
>         at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:136)
>         at io.netty.buffer.WrappedByteBuf.getBytes(WrappedByteBuf.java:289)
>         at 
> io.netty.buffer.UnsafeDirectLittleEndian.getBytes(UnsafeDirectLittleEndian.java:26)
>         at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>         at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>         at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>         at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>         at 
> org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:443)
>         at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getBytes(VarCharAccessor.java:125)
>         at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getString(VarCharAccessor.java:146)
>         at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:136)
>         at 
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:94)
>         at 
> org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148)
>         at 
> org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObject(TypeConvertingSqlAccessor.java:795)
>         at 
> org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:179)
>         at 
> net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
>         at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:420)
>         at sqlline.Rows$Row.<init>(Rows.java:157)
>         at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63)
>         at 
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
>         at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
>         at sqlline.SqlLine.print(SqlLine.java:1593)
>         at sqlline.Commands.execute(Commands.java:852)
>         at sqlline.Commands.sql(Commands.java:751)
>         at sqlline.SqlLine.dispatch(SqlLine.java:746)
>         at sqlline.SqlLine.begin(SqlLine.java:621)
>         at sqlline.SqlLine.start(SqlLine.java:375)
>         at sqlline.SqlLine.main(SqlLine.java:268)
> {noformat}
> A CTAS on the same file with storage as Parquet results in:
> {noformat}
> Error: SYSTEM ERROR: IllegalArgumentException: length: -260 (expected: >= 0)
> Fragment 1:2
> [Error Id: 1807615e-4385-4f85-8402-5900aaa568e9 on es07:31010]
>   (java.lang.IllegalArgumentException) length: -260 (expected: >= 0)
>     io.netty.buffer.AbstractByteBuf.checkIndex():1131
>     io.netty.buffer.PooledUnsafeDirectByteBuf.nioBuffer():344
>     io.netty.buffer.WrappedByteBuf.nioBuffer():727
>     io.netty.buffer.UnsafeDirectLittleEndian.nioBuffer():26
>     io.netty.buffer.DrillBuf.nioBuffer():356
>     
> org.apache.drill.exec.store.ParquetOutputRecordWriter$VarCharParquetConverter.writeField():1842
>     org.apache.drill.exec.store.EventBasedRecordWriter.write():62
>     org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():106
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>     
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():415
>     org.apache.hadoop.security.UserGroupInformation.doAs():1657
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():745 (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to