Dr. Christian Betz created SPARK-5715:
-----------------------------------------

             Summary: Shuffle size increase, performance loss from Spark 1.1.0 
to Spark 1.2.0 (and 1.2.1)
                 Key: SPARK-5715
                 URL: https://issues.apache.org/jira/browse/SPARK-5715
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.2.0, 1.2.1
         Environment: Running with local[*] as master (experienced it during 
pre-integration test, not in production cluster), 100GByte memory assigned, 16 
core machine.
            Reporter: Dr. Christian Betz


I see a *factor four performance loss* in my Spark jobs when migrating from 
Spark 1.1.0 to Spark 1.2.0 or 1.2.1.

Also, I see an *increase in the size of shuffle writes* (which is also reported 
by Kevin Jung on the mailing list: 
http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tt20894.html
 

Together with this I experience a *huge number of disk spills*.



I'm experiencing these with my job under the following circumstances: 

* Spark 1.2.0 with Sort-based Shuffle 
* Spark 1.2.0 with Hash-based Shuffle 
* Spark 1.2.1 with Sort-based Shuffle 

All three combinations show the same behavior, which contrasts from Spark 
1.1.0. 

In Spark 1.1.0, my job runs for about an hour, in Spark 1.2.x it runs for 
almost four hours. Configuration is identical otherwise - I only added 
org.apache.spark.scheduler.CompressedMapStatus to the Kryo registrator for 
Spark 1.2.0 to cope with https://issues.apache.org/jira/browse/SPARK-5102. 


As a consequence (I think, but causality might be different) I see lots and 
lots of disk spills. 

I cannot provide a small test case, but maybe the log entries for a single 
worker thread can help someone investigate on this. (See below.) 


I will also open up an issue, if nobody stops me by providing an answer ;) 

Any help will be greatly appreciated, because otherwise I'm stuck with Spark 
1.1.0, as quadrupling runtime is not an option. 

Sincerely, 

Chris 



2015-02-09T14:06:06.328+01:00 INFO org.apache.spark.executor.Executor Running 
task 9.0 in stage 18.0 (TID 300) Executor task launch worker-18 
2015-02-09T14:06:06.351+01:00 INFO org.apache.spark.CacheManager Partition 
rdd_35_9 not found, computing it Executor task launch worker-18 
2015-02-09T14:06:06.351+01:00 INFO 
org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 10 non-empty 
blocks out of 10 blocks Executor task launch worker-18 
2015-02-09T14:06:06.351+01:00 INFO 
org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches 
in 0 ms Executor task launch worker-18 
2015-02-09T14:06:07.396+01:00 INFO org.apache.spark.storage.MemoryStore 
ensureFreeSpace(2582904) called with curMem=300174944, maxMe... Executor task 
launch worker-18 
2015-02-09T14:06:07.397+01:00 INFO org.apache.spark.storage.MemoryStore Block 
rdd_35_9 stored as bytes in memory (estimated size 2.5... Executor task launch 
worker-18 
2015-02-09T14:06:07.398+01:00 INFO org.apache.spark.storage.BlockManagerMaster 
Updated info of block rdd_35_9 Executor task launch worker-18 
2015-02-09T14:06:07.399+01:00 INFO org.apache.spark.CacheManager Partition 
rdd_38_9 not found, computing it Executor task launch worker-18 
2015-02-09T14:06:07.399+01:00 INFO 
org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 10 non-empty 
blocks out of 10 blocks Executor task launch worker-18 
2015-02-09T14:06:07.400+01:00 INFO 
org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches 
in 0 ms Executor task launch worker-18 
2015-02-09T14:06:07.567+01:00 INFO org.apache.spark.storage.MemoryStore 
ensureFreeSpace(944848) called with curMem=302757848, maxMem... Executor task 
launch worker-18 
2015-02-09T14:06:07.568+01:00 INFO org.apache.spark.storage.MemoryStore Block 
rdd_38_9 stored as values in memory (estimated size 92... Executor task launch 
worker-18 
2015-02-09T14:06:07.569+01:00 INFO org.apache.spark.storage.BlockManagerMaster 
Updated info of block rdd_38_9 Executor task launch worker-18 
2015-02-09T14:06:07.573+01:00 INFO 
org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 34 non-empty 
blocks out of 50 blocks Executor task launch worker-18 
2015-02-09T14:06:07.573+01:00 INFO 
org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches 
in 1 ms Executor task launch worker-18 
2015-02-09T14:06:38.931+01:00 INFO org.apache.spark.CacheManager Partition 
rdd_41_9 not found, computing it Executor task launch worker-18 
2015-02-09T14:06:38.931+01:00 INFO 
org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 3 non-empty blocks 
out of 10 blocks Executor task launch worker-18 
2015-02-09T14:06:38.931+01:00 INFO 
org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches 
in 0 ms Executor task launch worker-18 
2015-02-09T14:06:38.945+01:00 INFO org.apache.spark.storage.MemoryStore 
ensureFreeSpace(0) called with curMem=307529127, maxMem=9261... Executor task 
launch worker-18 
2015-02-09T14:06:38.945+01:00 INFO org.apache.spark.storage.MemoryStore Block 
rdd_41_9 stored as bytes in memory (estimated size 0.0... Executor task launch 
worker-18 
2015-02-09T14:06:38.946+01:00 INFO org.apache.spark.storage.BlockManagerMaster 
Updated info of block rdd_41_9 Executor task launch worker-18 
2015-02-09T14:06:38.946+01:00 WARN org.apache.spark.storage.BlockManager Block 
rdd_41_9 replicated to only 0 peer(s) instead of 1 pee... Executor task launch 
worker-18 
2015-02-09T14:06:39.088+01:00 INFO org.apache.spark.storage.BlockManager Found 
block rdd_3_9 locally Executor task launch worker-18 
2015-02-09T14:06:41.389+01:00 INFO org.apache.spark.CacheManager Partition 
rdd_7_9 not found, computing it Executor task launch worker-18 
2015-02-09T14:06:41.389+01:00 INFO 
org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 1 non-empty blocks 
out of 1 blocks Executor task launch worker-18 
2015-02-09T14:06:41.389+01:00 INFO 
org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches 
in 0 ms Executor task launch worker-18 
2015-02-09T14:06:41.402+01:00 INFO org.apache.spark.storage.MemoryStore 
ensureFreeSpace(38144) called with curMem=307529151, maxMem=... Executor task 
launch worker-18 
2015-02-09T14:06:41.402+01:00 INFO org.apache.spark.storage.MemoryStore Block 
rdd_7_9 stored as values in memory (estimated size 37.... Executor task launch 
worker-18 
2015-02-09T14:06:41.404+01:00 INFO org.apache.spark.storage.BlockManagerMaster 
Updated info of block rdd_7_9 Executor task launch worker-18 
2015-02-09T14:07:00.019+01:00 INFO org.apache.spark.CacheManager Partition 
rdd_73_9 not found, computing it Executor task launch worker-18 
2015-02-09T14:07:00.019+01:00 INFO 
org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 1 non-empty blocks 
out of 1 blocks Executor task launch worker-18 
2015-02-09T14:07:00.019+01:00 INFO 
org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches 
in 0 ms Executor task launch worker-18 
2015-02-09T14:07:00.185+01:00 INFO org.apache.spark.storage.MemoryStore 
ensureFreeSpace(826250) called with curMem=307567295, maxMem... Executor task 
launch worker-18 
2015-02-09T14:07:00.185+01:00 INFO org.apache.spark.storage.MemoryStore Block 
rdd_73_9 stored as values in memory (estimated size 80... Executor task launch 
worker-18 
2015-02-09T14:07:00.186+01:00 INFO org.apache.spark.storage.BlockManagerMaster 
Updated info of block rdd_73_9 Executor task launch worker-18 
2015-02-09T14:07:00.190+01:00 INFO org.apache.spark.storage.BlockManager Found 
block rdd_35_9 locally Executor task launch worker-18 
2015-02-09T14:07:00.190+01:00 INFO org.apache.spark.storage.BlockManager Found 
block rdd_38_9 locally Executor task launch worker-18 
2015-02-09T14:07:00.194+01:00 INFO 
org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 34 non-empty 
blocks out of 50 blocks Executor task launch worker-18 
2015-02-09T14:07:00.194+01:00 INFO 
org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches 
in 1 ms Executor task launch worker-18 
2015-02-09T14:07:17.967+01:00 INFO 
org.apache.spark.util.collection.ExternalAppendOnlyMap Thread 117 spilling 
in-memory map of 670.2 MB to disk (1 tim... Executor task launch worker-18 
2015-02-09T14:07:46.716+01:00 INFO org.apache.spark.storage.BlockManager Found 
block rdd_41_9 locally Executor task launch worker-18 
2015-02-09T14:07:47.603+01:00 INFO org.apache.spark.storage.BlockManager Found 
block rdd_3_9 locally Executor task launch worker-18 
2015-02-09T14:07:47.897+01:00 INFO 
org.apache.spark.util.collection.ExternalAppendOnlyMap Thread 117 spilling 
in-memory map of 5.0 MB to disk (1 time ... Executor task launch worker-18 
2015-02-09T14:07:48.270+01:00 INFO 
org.apache.spark.util.collection.ExternalAppendOnlyMap Thread 117 spilling 
in-memory map of 5.0 MB to disk (2 times... Executor task launch worker-18 
2015-02-09T14:07:48.727+01:00 INFO 
org.apache.spark.util.collection.ExternalAppendOnlyMap Thread 117 spilling 
in-memory map of 5.0 MB to disk (3 times... Executor task launch worker-18 
2015-02-09T14:07:49.021+01:00 INFO 
org.apache.spark.util.collection.ExternalAppendOnlyMap Thread 117 spilling 
in-memory map of 5.0 MB to disk (4 times... Executor task launch worker-18 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to