[
https://issues.apache.org/jira/browse/DRILL-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197208#comment-15197208
]
Deneche A. Hakim commented on DRILL-4317:
-----------------------------------------
I found a bug in TextInput.updateLengthBasedOnConstraint() when Drill splits
csv files. In most cases it works fine but when the split line ends with an
empty value AND one of the previous rows in the same last batch contain a value
in the last column we see the exception described above.
> Exceptions on SELECT and CTAS with large CSV files
> --------------------------------------------------
>
> Key: DRILL-4317
> URL: https://issues.apache.org/jira/browse/DRILL-4317
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 1.4.0, 1.5.0
> Environment: 4 node cluster, Hadoop 2.7.0, 14.04.1-Ubuntu
> Reporter: Matt Keranen
> Assignee: Deneche A. Hakim
>
> Selecting from a CSV file or running a CTAS into Parquet generates exceptions.
> Source file is ~650MB, a table of 4 key columns followed by 39 numeric data
> columns, otherwise a fairly simple format. Example:
> {noformat}
> 2015-10-17
> 00:00,f5e9v8u2,err,fr7,226020793,76.094,26307,226020793,76.094,26307,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
> 2015-10-17
> 00:00,c3f9x5z2,err,mi1,1339159295,216.004,177690,1339159295,216.004,177690,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
> 2015-10-17
> 00:00,r5z2f2i9,err,mi1,7159994629,39718.011,65793,6142021303,30687.811,64630,143777403,40.521,146,75503742,41.905,89,170771174,168.165,198,192565529,370.475,222,97577280,318.068,120,62631452,288.253,68,32371173,189.527,39,41712265,299.184,46,39046408,363.418,47,34182318,465.343,43,127834582,6485.341,145
> 2015-10-17
> 00:00,j9s6i8t2,err,fr7,20580443899,277445.055,67826,2814893469,85447.816,54275,2584757097,608.001,2044,1395571268,769.113,1051,3070616988,3000.005,2284,3413811671,6489.060,2569,1772235156,5806.214,1339,1097879284,5064.120,858,691884865,4035.397,511,672967845,4815.875,518,789163614,7306.684,599,813910495,10632.464,627,1462752147,143470.306,1151
> {noformat}
> A "SELECT from `/path/to/file.csv`" runs for 10's of minutes and eventually
> results in:
> {noformat}
> java.lang.IndexOutOfBoundsException: index: 547681, length: 1 (expected:
> range(0, 547681))
> at
> io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1134)
> at
> io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:136)
> at io.netty.buffer.WrappedByteBuf.getBytes(WrappedByteBuf.java:289)
> at
> io.netty.buffer.UnsafeDirectLittleEndian.getBytes(UnsafeDirectLittleEndian.java:26)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
> at
> org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:443)
> at
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getBytes(VarCharAccessor.java:125)
> at
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getString(VarCharAccessor.java:146)
> at
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:136)
> at
> org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:94)
> at
> org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148)
> at
> org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObject(TypeConvertingSqlAccessor.java:795)
> at
> org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:179)
> at
> net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
> at
> org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:420)
> at sqlline.Rows$Row.<init>(Rows.java:157)
> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63)
> at
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> at sqlline.SqlLine.print(SqlLine.java:1593)
> at sqlline.Commands.execute(Commands.java:852)
> at sqlline.Commands.sql(Commands.java:751)
> at sqlline.SqlLine.dispatch(SqlLine.java:746)
> at sqlline.SqlLine.begin(SqlLine.java:621)
> at sqlline.SqlLine.start(SqlLine.java:375)
> at sqlline.SqlLine.main(SqlLine.java:268)
> {noformat}
> A CTAS on the same file with storage as Parquet results in:
> {noformat}
> Error: SYSTEM ERROR: IllegalArgumentException: length: -260 (expected: >= 0)
> Fragment 1:2
> [Error Id: 1807615e-4385-4f85-8402-5900aaa568e9 on es07:31010]
> (java.lang.IllegalArgumentException) length: -260 (expected: >= 0)
> io.netty.buffer.AbstractByteBuf.checkIndex():1131
> io.netty.buffer.PooledUnsafeDirectByteBuf.nioBuffer():344
> io.netty.buffer.WrappedByteBuf.nioBuffer():727
> io.netty.buffer.UnsafeDirectLittleEndian.nioBuffer():26
> io.netty.buffer.DrillBuf.nioBuffer():356
>
> org.apache.drill.exec.store.ParquetOutputRecordWriter$VarCharParquetConverter.writeField():1842
> org.apache.drill.exec.store.EventBasedRecordWriter.write():62
> org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():106
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745 (state=,code=0)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)