[ https://issues.apache.org/jira/browse/SPARK-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170395#comment-14170395 ]
Saisai Shao commented on SPARK-3426: ------------------------------------ Sorry about that, I just saw the PR (https://github.com/apache/spark/pull/2247) discuss about this. > Sort-based shuffle compression behavior is inconsistent > ------------------------------------------------------- > > Key: SPARK-3426 > URL: https://issues.apache.org/jira/browse/SPARK-3426 > Project: Spark > Issue Type: Bug > Affects Versions: 1.1.0 > Reporter: Andrew Or > Assignee: Andrew Or > Priority: Blocker > > We have the following configs: > {code} > spark.shuffle.compress > spark.shuffle.spill.compress > {code} > When these two diverge, sort-based shuffle fails with a compression exception > under certain workloads. This is because in sort-based shuffle we serve the > index file (using spark.shuffle.spill.compress) as a normal shuffle file > (using spark.shuffle.compress). It was unfortunate in retrospect that these > two configs were exposed so we can't easily remove them. > Here is how this can be reproduced. Set the following in your > spark-defaults.conf: > {code} > spark.master local-cluster[1,1,512] > spark.shuffle.spill.compress false > spark.shuffle.compress true > spark.shuffle.manager sort > spark.shuffle.memoryFraction 0.001 > {code} > Then run the following in spark-shell: > {code} > sc.parallelize(0 until 100000).map(i => (i/4, i)).groupByKey().collect() > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org