[jira] [Comment Edited] (SPARK-18105) LZ4 failed to decompress a stream of shuffled data

M. Le Bihan (JIRA) Tue, 18 Jun 2019 12:11:14 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866973#comment-16866973
 ]


M. Le Bihan edited comment on SPARK-18105 at 6/18/19 7:10 PM:
--------------------------------------------------------------

My trick eventually didn't succeed. And I fall back into the bug again.

I've apttemted to upgrade from spark-xxx_2.11 to spark_xxx.2.12 for scala but 
received this kind of stacktrace :

{code:log}
2019-06-18 20:43:54.747  INFO 1539 --- [er for task 547] 
o.a.s.s.ShuffleBlockFetcherIterator      : Started 0 remote fetches in 0 ms
2019-06-18 20:43:59.015 ERROR 1539 --- [er for task 547] 
org.apache.spark.executor.Executor       : Exception in task 93.0 in stage 4.2 
(TID 547)

java.lang.NullPointerException: null
    at 
org.apache.spark.rdd.PairRDDFunctions.$anonfun$mapValues$3(PairRDDFunctions.scala:757)
 ~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) 
~[scala-library-2.12.8.jar!/:na]
    at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) 
~[scala-library-2.12.8.jar!/:na]
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) 
~[scala-library-2.12.8.jar!/:na]
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) 
~[scala-library-2.12.8.jar!/:na]
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) 
~[scala-library-2.12.8.jar!/:na]
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) 
~[scala-library-2.12.8.jar!/:na]
    at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
 ~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
 ~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) 
~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at org.apache.spark.scheduler.Task.run(Task.scala:121) 
~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
 ~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) 
~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) 
~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_212]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_212]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_212]

{code}

This issue 18105 prevent from using spark at all. 
{color:#654982}Spark +cannot+ handle large files (1 Go, 5 Go, 10 Go that I ask 
Spark to join for me) : *LOL, LOL, LOL !*{color} When this will be corrected ?! 
It should be raised to urgent.

Spark with 300 opened and in progress (but stalling) issues is become less and 
less reliable each day.
I'm about to send a message on dev forum to ask if developers can stop 
implementing new features until they have corrected the issues on the features 
they once written.

Spark Today cannot be used at all.
At least offer a way to disable the LZ4 feature if it doesn't work !


was (Author: mlebihan):
My trick eventually didn't succeed. And I fall back into the bug again.

I've apttemted to upgrade from spark-xxx_2.11 to spark_xxx.2.12 for scala but 
received this kind of stacktrace :

{code:log}
2019-06-18 20:43:54.747  INFO 1539 --- [er for task 547] 
o.a.s.s.ShuffleBlockFetcherIterator      : Started 0 remote fetches in 0 ms
2019-06-18 20:43:59.015 ERROR 1539 --- [er for task 547] 
org.apache.spark.executor.Executor       : Exception in task 93.0 in stage 4.2 
(TID 547)

java.lang.NullPointerException: null
    at 
org.apache.spark.rdd.PairRDDFunctions.$anonfun$mapValues$3(PairRDDFunctions.scala:757)
 ~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) 
~[scala-library-2.12.8.jar!/:na]
    at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) 
~[scala-library-2.12.8.jar!/:na]
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) 
~[scala-library-2.12.8.jar!/:na]
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) 
~[scala-library-2.12.8.jar!/:na]
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) 
~[scala-library-2.12.8.jar!/:na]
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) 
~[scala-library-2.12.8.jar!/:na]
    at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
 ~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
 ~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) 
~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at org.apache.spark.scheduler.Task.run(Task.scala:121) 
~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
 ~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) 
~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) 
~[spark-core_2.12-2.4.3.jar!/:2.4.3]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_212]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_212]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_212]

{code}

This issue 18105 prevent from using spark at all. 
{color:#654982}Spark +cannot+ handle large files (1 Go, 5 Go, 10 Go that I ask 
Spark to join for me) : *LOL, LOL, LOL !*{color} When this will be corrected ?! 
It should be raised to urgent.

Spark with 300 opened and in progress (but stalling) issues is become less and 
less reliable each day.
I'm about to send a message on dev forum to ask if developers can stop 
implementing new features until they have corrected the issues on the features 
they once written.

Spark Today cannot be used at all.
At least find a way to disable the LZ4 feature you can't handle properly.

> LZ4 failed to decompress a stream of shuffled data
> --------------------------------------------------
>
>                 Key: SPARK-18105
>                 URL: https://issues.apache.org/jira/browse/SPARK-18105
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Davies Liu
>            Assignee: Davies Liu
>            Priority: Major
>
> When lz4 is used to compress the shuffle files, it may fail to decompress it 
> as "stream is corrupt"
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 92 in stage 5.0 failed 4 times, most recent failure: Lost task 92.3 in 
> stage 5.0 (TID 16616, 10.0.27.18): java.io.IOException: Stream is corrupted
>       at 
> org.apache.spark.io.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:220)
>       at 
> org.apache.spark.io.LZ4BlockInputStream.available(LZ4BlockInputStream.java:109)
>       at java.io.BufferedInputStream.read(BufferedInputStream.java:353)
>       at java.io.DataInputStream.read(DataInputStream.java:149)
>       at com.google.common.io.ByteStreams.read(ByteStreams.java:828)
>       at com.google.common.io.ByteStreams.readFully(ByteStreams.java:695)
>       at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:127)
>       at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:110)
>       at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>       at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>       at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
>       at 
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
>       at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>       at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>  Source)
>       at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>       at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>       at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>       at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:397)
>       at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>       at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>       at org.apache.spark.scheduler.Task.run(Task.scala:86)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> https://github.com/jpountz/lz4-java/issues/89



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18105) LZ4 failed to decompress a stream of shuffled data

Reply via email to