tprelle commented on pull request #130:
URL: https://github.com/apache/tez/pull/130#issuecomment-857795873


   Hi  @abstractdog thanks to look into it
   I add the issue on the reader of IFile.
   <pre><code>
    
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
 Error while doing final merge
           at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:312)
           at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:277)
           at 
org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
           at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
           at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
           at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.InternalError: Could not decompress data. Buffer length 
is too small.
           at 
org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native
 Method)
           at 
org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:235)
           at 
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
           at 
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
           at 
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:92)
           at java.io.DataInputStream.readByte(DataInputStream.java:265)
           at 
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
           at 
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
           at 
org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readKeyValueLength(IFile.java:935)
           at 
org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.positionToNextRecord(IFile.java:965)
           at 
org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readRawKey(IFile.java:1006)
           at 
org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.nextRawKey(IFile.java:987)
           at 
org.apache.tez.runtime.library.common.sort.impl.TezMerger$Segment.nextRawKey(TezMerger.java:317)
           at 
org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.merge(TezMerger.java:777)
           at 
org.apache.tez.runtime.library.common.sort.impl.TezMerger.merge(TezMerger.java:206)
           at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:1298)
           at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:666)
           at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:308)
           ... 8 more
            </code></pre>
   I was not able to reproduce it on unit test but i run this type of query on 
a large dataset.
   <pre><code>
   WITH cte_setting AS (
     SELECT
       a,
       ARRAY(
         NAMED_STRUCT(
           "b",
           "c",
           "d",
           MAX(
             STRUCT(
               date,
               CASE WHEN e IS NOT NULL
               AND e <> '' THEN e END
             )
           ).col2
         ),
         NAMED_STRUCT(
           "b",
           "c",
           "d",
           MAX(
             STRUCT(
               date,
               CASE WHEN "f" IS NOT NULL
               AND "f" <> '' THEN "f" END
             )
           ).col2
         )
       ) AS arrayOption
     FROM
       table
     GROUP BY
       id
   )
   SELECT
     id,
     t.col.b AS b,
     t.col.b AS b
   FROM
     cte_setting LATERAL VIEW explode(arrayOption) t
   LIMIT
     1000
   </code></pre>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to