[ 
https://issues.apache.org/jira/browse/DRILL-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289785#comment-14289785
 ] 

Jason Altekruse commented on DRILL-2031:
----------------------------------------

It is definitely possible to an extent. The comment in the patch describes the 
old strategy which did bulk copies until we hit a condition where it was an 
issue and we had to fall back on copying a byte at a time with a shift. In this 
case it just isn't a large enough performance bottleneck to justify trying to 
debug the complex code that was there. A significant amount of time has been 
invested in the parquet reader and we need to just prioritize accuracy for now.

> IndexOutOfBoundException when reading a wide parquet table with boolean 
> columns
> -------------------------------------------------------------------------------
>
>                 Key: DRILL-2031
>                 URL: https://issues.apache.org/jira/browse/DRILL-2031
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 0.7.0
>            Reporter: Aman Sinha
>            Assignee: Parth Chandra
>            Priority: Critical
>         Attachments: DRILL-2031-Parquet-bit-reader-fix.patch, wide1.sql
>
>
> I created a wide table with 128 Lineitem columns plus 6 additional boolean 
> columns for a total of 134 columns via a CTAS script (see attached SQL).  The 
> source data is from TPCH scale factor 1 (smaller scale factor may not 
> reproduce the problem). The creation of the table was Ok.  Reading from the 
> table gives an IOBE.  See stack below.  It seems to occur for the boolean 
> columns.  
> {code}
> 0: jdbc:drill:zk=local> select * from wide1 where 1=0;
> java.lang.IndexOutOfBoundsException: srcIndex: 97792
>       
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:255)
>  ~[netty-buffer-4.0.24.Final.jar:4.0.24.Final]
>       io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:378) 
> ~[netty-buffer-4.0.24.Final.jar:4.0.24.Final]
>       
> io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:25)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:4.0.24.Final]
>       io.netty.buffer.DrillBuf.setBytes(DrillBuf.java:645) 
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:4.0.24.Final]
>       io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:850) 
> ~[netty-buffer-4.0.24.Final.jar:4.0.24.Final]
>       
> org.apache.drill.exec.store.parquet.columnreaders.BitReader.readField(BitReader.java:54)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>       
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.readValues(ColumnReader.java:120)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>       
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPageData(ColumnReader.java:169)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>       
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.determineSize(ColumnReader.java:146)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>       
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPages(ColumnReader.java:107)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>       
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:367)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>       
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:413)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>       org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:158) 
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to