[ 
https://issues.apache.org/jira/browse/HIVE-29585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18087057#comment-18087057
 ] 

Kokila N commented on HIVE-29585:
---------------------------------

*Failing Test cases:*
 * Iceberg + Llap + Vectorization + ORC + lz4 compression .

 ** Disable Vectorization as workaround (set hive.vectorized.execution.enabled 
= false)

 * Llap + ORC + lz4

 *Root Cause:*

When *Llap Cache* is enabled and the data is read for the first time, it is 
read from the disk. If the same data is accessed multiple times, it stores the 
file metadata in the cache so that the subsequent queries execute faster. For 
this, LLAP stores metadata in *Direct Memory Buffers* which is not stored in 
Java Heap buffer and so *not an array.* 

So, when there is a *cache hit* for a query, we read from cache(here it is ORC 
file footer stripe) which is a direct 
buffer([getStripeFooterFromCacheOrDisk|https://github.com/apache/hive/blob/709d06bce95df7dc66c63f90ce99aadf8f24f489/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L728])
 and send it to ORC to 
decompress([createORCStripeMetadataObject|https://github.com/apache/hive/blob/709d06bce95df7dc66c63f90ce99aadf8f24f489/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L720]).
 And the ORC LZ4 only knows to *decompress the java heap buffer* which is an 
array 
([https://github.com/apache/orc/blob/dd3ec892123e42d6dfe3e0db3da40fd36d62c46a/java/core/src/java/org/apache/orc/impl/AircompressorCodec.java#L94]
 ) . So, when it tries to read direct buffer as array , we get 
{*}UnsupportedOperationException{*}.

> ORC LZ4: OrcEncodedDataReader stripe footer fails on direct ByteBuffers from 
> LLAP cache / ZCR
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-29585
>                 URL: https://issues.apache.org/jira/browse/HIVE-29585
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Kokila N
>            Assignee: Kokila N
>            Priority: Major
>
> Query:
> {code:java}
> CREATE TABLE IF NOT EXISTS ice_orc_test (
>     id INT, random1 STRING
> )
> PARTITIONED BY (random2 STRING)
> STORED BY ICEBERG
> TBLPROPERTIES (
>     'write.format.default'='orc',
>     'format-version'='2',
>     'write.orc.compression-codec'='lz4'
> );
> // Error on the 4th try
> INSERT INTO ice_orc_test SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, uuid(), 
> uuid() FROM ice_orc_test; {code}
> Error:
> {code:java}
> Caused by: java.lang.UnsupportedOperationException
>         at java.base/java.nio.ByteBuffer.array(ByteBuffer.java:1505)
>         at 
> org.apache.orc.impl.AircompressorCodec.decompress(AircompressorCodec.java:94)
>         at 
> org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:521)
>         at 
> org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:548)
>         at 
> org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:535)
>         at 
> com.google.protobuf.CodedInputStream$StreamDecoder.read(CodedInputStream.java:2036)
>         at 
> com.google.protobuf.CodedInputStream$StreamDecoder.tryRefillBuffer(CodedInputStream.java:2777)
>         at 
> com.google.protobuf.CodedInputStream$StreamDecoder.isAtEnd(CodedInputStream.java:2700)
>         at 
> com.google.protobuf.CodedInputStream$StreamDecoder.readTag(CodedInputStream.java:2063)
>         at org.apache.orc.OrcProto$StripeFooter.<init>(OrcProto.java:19300)
>         at 
> org.apache.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:20956)
>         at 
> org.apache.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:20950)
>         at 
> com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:63)
>         at 
> com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:68)
>         at 
> com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>         at 
> com.google.protobuf.GeneratedMessageV3.parseWithIOException(GeneratedMessageV3.java:353)
>         at org.apache.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:19736)
>         at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.buildStripeFooter(OrcEncodedDataReader.java:691)
>         at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getStripeFooterFromCacheOrDisk(OrcEncodedDataReader.java:740)
>         at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.readStripesMetadata(OrcEncodedDataReader.java:707)
>         at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:360)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to