[
https://issues.apache.org/jira/browse/DRILL-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289785#comment-14289785
]
Jason Altekruse commented on DRILL-2031:
----------------------------------------
It is definitely possible to an extent. The comment in the patch describes the
old strategy which did bulk copies until we hit a condition where it was an
issue and we had to fall back on copying a byte at a time with a shift. In this
case it just isn't a large enough performance bottleneck to justify trying to
debug the complex code that was there. A significant amount of time has been
invested in the parquet reader and we need to just prioritize accuracy for now.
> IndexOutOfBoundException when reading a wide parquet table with boolean
> columns
> -------------------------------------------------------------------------------
>
> Key: DRILL-2031
> URL: https://issues.apache.org/jira/browse/DRILL-2031
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 0.7.0
> Reporter: Aman Sinha
> Assignee: Parth Chandra
> Priority: Critical
> Attachments: DRILL-2031-Parquet-bit-reader-fix.patch, wide1.sql
>
>
> I created a wide table with 128 Lineitem columns plus 6 additional boolean
> columns for a total of 134 columns via a CTAS script (see attached SQL). The
> source data is from TPCH scale factor 1 (smaller scale factor may not
> reproduce the problem). The creation of the table was Ok. Reading from the
> table gives an IOBE. See stack below. It seems to occur for the boolean
> columns.
> {code}
> 0: jdbc:drill:zk=local> select * from wide1 where 1=0;
> java.lang.IndexOutOfBoundsException: srcIndex: 97792
>
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:255)
> ~[netty-buffer-4.0.24.Final.jar:4.0.24.Final]
> io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:378)
> ~[netty-buffer-4.0.24.Final.jar:4.0.24.Final]
>
> io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:25)
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:4.0.24.Final]
> io.netty.buffer.DrillBuf.setBytes(DrillBuf.java:645)
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:4.0.24.Final]
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:850)
> ~[netty-buffer-4.0.24.Final.jar:4.0.24.Final]
>
> org.apache.drill.exec.store.parquet.columnreaders.BitReader.readField(BitReader.java:54)
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.readValues(ColumnReader.java:120)
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPageData(ColumnReader.java:169)
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.determineSize(ColumnReader.java:146)
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPages(ColumnReader.java:107)
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:367)
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:413)
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:158)
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)