tprelle commented on pull request #130:
URL: https://github.com/apache/tez/pull/130#issuecomment-857795873
Hi @abstractdog thanks to look into it
I add the issue on the reader of IFile.
<pre><code>
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
Error while doing final merge
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:312)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:277)
at
org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InternalError: Could not decompress data. Buffer length
is too small.
at
org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native
Method)
at
org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:235)
at
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:92)
at java.io.DataInputStream.readByte(DataInputStream.java:265)
at
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
at
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
at
org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readKeyValueLength(IFile.java:935)
at
org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.positionToNextRecord(IFile.java:965)
at
org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readRawKey(IFile.java:1006)
at
org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.nextRawKey(IFile.java:987)
at
org.apache.tez.runtime.library.common.sort.impl.TezMerger$Segment.nextRawKey(TezMerger.java:317)
at
org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.merge(TezMerger.java:777)
at
org.apache.tez.runtime.library.common.sort.impl.TezMerger.merge(TezMerger.java:206)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:1298)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:666)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:308)
... 8 more
</code></pre>
I was not able to reproduce it on unit test but i run this type of query on
a large dataset.
<pre><code>
WITH cte_setting AS (
SELECT
a,
ARRAY(
NAMED_STRUCT(
"b",
"c",
"d",
MAX(
STRUCT(
date,
CASE WHEN e IS NOT NULL
AND e <> '' THEN e END
)
).col2
),
NAMED_STRUCT(
"b",
"c",
"d",
MAX(
STRUCT(
date,
CASE WHEN "f" IS NOT NULL
AND "f" <> '' THEN "f" END
)
).col2
)
) AS arrayOption
FROM
table
GROUP BY
id
)
SELECT
id,
t.col.b AS b,
t.col.b AS b
FROM
cte_setting LATERAL VIEW explode(arrayOption) t
LIMIT
1000
</code></pre>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]