Aman Sinha created DRILL-2334:
---------------------------------
Summary: Text record reader should fail gracefully when
encountering bad records
Key: DRILL-2334
URL: https://issues.apache.org/jira/browse/DRILL-2334
Project: Apache Drill
Issue Type: Improvement
Components: Storage - Text & CSV
Reporter: Aman Sinha
Assignee: Hanifi Gunes
The attached file has 1 bad record. Running a simple count(*) query on this
file errors out with IOBE and/or possible schema change exception.
The hex dump of the file shows a bunch of 0's (the '*' below indicates more
lines of 0's):
{code}
00001c0 3a 35 35 2e 35 30 35 35 30 00 00 00 00 00 00 00
00001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
02a01c0 00 00 00 00 00 00 00 00 00 35 35 35 0a 35 35 35
{code}
{code}
0: jdbc:drill:zk=local> select count(*) from `badRecords2.dat`;
+------------+
| EXPR$0 |
+------------+
Query failed: RemoteRpcException: Failure while running fragment., You tried to
do a batch data read operation when you were in a state of STOP. You can only
do this type of operation when you are in a state of OK or OK_NEW_SCHEMA.
{code}
log file also shows an IOBE related to this:
{code}
18:49:00.003 [2b1024e4-5639-b4ec-392e-8d5879c3d4db:frag:0:0] DEBUG
o.a.d.exec.physical.impl.ScanBatch - Failed to read the batch. Stopping...
java.lang.IndexOutOfBoundsException: index: 374, length: 2752540 (expected:
range(0, 65536))
at
io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1143)
~[netty-buffer-4.0.24.Final.jar:4.0.24.Final]
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:272)
~[netty-buffer-4.0.24.Final.jar:4.0.24.Final]
at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:390)
~[netty-buffer-4.0.24.Final.jar:4.0.24.Final]
at
io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:25)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:4.0.24.Final]
at io.netty.buffer.DrillBuf.setBytes(DrillBuf.java:651)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:4.0.24.Final]
at
org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe(VarCharVector.java:481)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
at
org.apache.drill.exec.vector.RepeatedVarCharVector$Mutator.addSafe(RepeatedVarCharVector.java:451)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
at
org.apache.drill.exec.store.text.DrillTextRecordReader.next(DrillTextRecordReader.java:172)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
at
org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:165)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)