Hi,
I am seeing wrong computation of storage memory available which is leading
to executor failures. I have allocated 8g memory with params:
spark.memory.fraction=0.7
spark.memory.storageFraction=0.4
As expected, I was able to see 5.2 GB storage memory in UI.
However, as per memory store logs I am seeing that free memory is
increasing as the rdds are getting cached; which should have been ideally
decreasing from 5.2 to 0.

Eventually, the executor runs OOM whereas the Memory store is reporting it
has 5G available.


Attached executor logs -

2016-10-28 12:36:51,996 INFO  [Executor task launch worker-10]
executor.Executor (Logging.scala:logInfo(58)) - Running task 324.0 in stage
1.0 (TID 760)
2016-10-28 12:36:52,019 INFO  [Executor task launch worker-10]
spark.CacheManager (Logging.scala:logInfo(58)) - Partition rdd_10_324 not
found, computing it
2016-10-28 12:36:52,031 INFO  [Executor task launch worker-8]
storage.MemoryStore (Logging.scala:logInfo(58)) - *Block rdd_10_145 stored
as values in memory (estimated size 65.0 MB, free 260.7 MB)*
2016-10-28 12:36:52,062 INFO  [Executor task launch worker-10]
storage.ShuffleBlockFetcherIterator (Logging.scala:logInfo(58)) - Getting
426 non-empty blocks out of 426 blocks
2016-10-28 12:36:52,109 INFO  [Executor task launch worker-10]
storage.ShuffleBlockFetcherIterator (Logging.scala:logInfo(58)) - Started
37 remote fetches in 47 ms
2016-10-28 12:36:52,130 INFO  [Executor task launch worker-8]
executor.Executor (Logging.scala:logInfo(58)) - Finished task 145.0 in
stage 1.0 (TID 581). 268313 bytes result sent to driver
2016-10-28 12:36:52,150 INFO  [dispatcher-event-loop-7]
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(58)) - Got
assigned task 761
2016-10-28 12:36:52,150 INFO  [Executor task launch worker-8]
executor.Executor (Logging.scala:logInfo(58)) - Running task 325.0 in stage
1.0 (TID 761)
2016-10-28 12:36:52,164 INFO  [Executor task launch worker-8]
spark.CacheManager (Logging.scala:logInfo(58)) - Partition rdd_10_325 not
found, computing it
2016-10-28 12:36:52,247 INFO  [Executor task launch worker-8]
storage.ShuffleBlockFetcherIterator (Logging.scala:logInfo(58)) - Getting
426 non-empty blocks out of 426 blocks
2016-10-28 12:36:52,264 INFO  [Executor task launch worker-8]
storage.ShuffleBlockFetcherIterator (Logging.scala:logInfo(58)) - Started
37 remote fetches in 18 ms
2016-10-28 12:36:52,591 INFO  [Executor task launch worker-6]
storage.MemoryStore (Logging.scala:logInfo(58)) -* Block rdd_10_45 stored
as values in memory (estimated size 65.0 MB, free 325.7 MB)*
2016-10-28 12:36:52,646 INFO  [Executor task launch worker-6]
executor.Executor (Logging.scala:logInfo(58)) - Finished task 45.0 in stage
1.0 (TID 481). 266368 bytes result sent to driver

Eventual Failures logs-

2016-10-28 12:53:06,718 WARN  [Executor task launch worker-13]
storage.MemoryStore (Logging.scala:logWarning(70)) - Not enough space to
cache rdd_10_656 in memory! (computed 45.2 MB so far)
2016-10-28 12:53:06,718 INFO  [Executor task launch worker-13]
storage.MemoryStore (Logging.scala:logInfo(58)) - Memory use = 5.0 GB
(blocks) + 211.4 MB (scratch space shared across 103 tasks(s)) = 5.2 GB.
Storage limit = 5.2 GB.
2016-10-28 12:53:06,718 INFO  [Executor task launch worker-13]
storage.BlockManager (Logging.scala:logInfo(58)) - Found block rdd_10_656
locally
2016-10-28 12:53:06,719 INFO  [Executor task launch worker-12]
storage.MemoryStore (Logging.scala:logInfo(58)) - 1 blocks selected for
dropping
2016-10-28 12:53:06,720 INFO  [Executor task launch worker-12]
storage.BlockManager (Logging.scala:logInfo(58)) - Dropping block
rdd_10_719 from memory
2016-10-28 12:53:06,720 INFO  [Executor task launch worker-12]
storage.BlockManager (Logging.scala:logInfo(58)) - Writing block rdd_10_719
to disk
2016-10-28 12:53:06,736 ERROR [Executor task launch worker-15]
executor.Executor (Logging.scala:logError(95)) - Exception in task 657.0 in
stage 4.0 (TID 4565)
java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got
85027
at
org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:91)


Is this a bug or I am setting something wrong?

Regards,

Sushrut Ikhar
[image: https://]about.me/sushrutikhar
<https://about.me/sushrutikhar?promo=email_sig>

Reply via email to