[jira] [Created] (SPARK-20237) Spark-1.6 current and later versions of memory management issues

zhangwei72 (JIRA) Wed, 05 Apr 2017 23:47:04 -0700

zhangwei72 created SPARK-20237:
----------------------------------

             Summary: Spark-1.6 current and later versions of memory management 
issues
                 Key: SPARK-20237
                 URL: https://issues.apache.org/jira/browse/SPARK-20237
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.1.0, 2.0.2, 2.0.1, 2.0.0, 1.6.3, 1.6.2, 1.6.1, 1.6.0
         Environment: java 1.7.0  scala-2.10.5   maven-3.3.9    hadoop-2.2.0  
spark-1.6.2
            Reporter: zhangwei72
            Priority: Critical
             Fix For: 1.6.2



In spark-1.6 and later versions, there is a problem with its memory management 
UnifiedMemoryManager.
Spark.memory.storageFraction configuration should be at least storage Memory 
memory.
In the memory management UnifiedMemoryManager, the calculation of Execution 
memory can be up to storage how much memory can borrow,using val 
memoryReclaimableFromStorage = 
math.max(storageMemoryPool.memoryFree,storageMemoryPool.poolSize
- storageRegionSize）.
When storageMemoryPool.memoryFree > storageMemoryPool.poolSize - 
storageRegionSize, the size of the a will be chosen, that is,storage Memory 
will reduce the storageMemoryPool.memoryFree so much.
Because of storageMemoryPool.memoryFree > storageMemoryPool.poolSize - 
storageRegionSize, so storageMemoryPool.poolSize - storageMemoryPool.memoryFree 
< storageRegionSize
Now storageMemoryPool.poolSize < storageRegionSize,storageRegionSize is the 
smallest proportion of frame definition,so there is a problem.
To solve this problem, we define the function as  val 
memoryReclaimableFromStorage = storageMemoryPool.poolSize - storageRegionSize.


Experimental proof：
I added some log information to the UnifiedMemoryManager file as follows:
logInfo("storageMemoryPool.memoryFree 
%f".format(storageMemoryPool.memoryFree/1024.0/1024.0))               
logInfo("onHeapExecutionMemoryPool.memoryFree 
%f".format(onHeapExecutionMemoryPool.memoryFree/1024.0/1024.0))             
logInfo("storageMemoryPool.memoryUsed %f".format( 
storageMemoryPool.memoryUsed/1024.0/1024.0))             
logInfo("onHeapExecutionMemoryPool.memoryUsed 
%f".format(onHeapExecutionMemoryPool.memoryUsed/1024.0/1024.0))             
logInfo("storageMemoryPool.poolSize %f".format( 
storageMemoryPool.poolSize/1024.0/1024.0))            
logInfo("onHeapExecutionMemoryPool.poolSize 
%f".format(onHeapExecutionMemoryPool.poolSize/1024.0/1024.0))
  When I run the PageRank program, the input file for PageRank is generated by 
the BigDataBench-Chinese Academy of Sciences and is used to evaluate large data 
analysis system tools with a size of 676M. The information submitted is as 
follows:
./bin/spark-submit --class org.apache.spark.examples.SparkPageRank \
    --master yarn \
    --deploy-mode cluster \
    --num-executors 1 \
    --driver-memory 4g \
    --executor-memory 7g \
    --executor-cores 6 \
    --queue thequeue \
    ./examples/target/scala-2.10/spark-examples-1.6.2-hadoop2.2.0.jar \
     /test/Google_genGraph_23.txt 6
The configuration is as follows：
spark.memory.useLegacyMode=false
spark.memory.fraction=0.75
spark.memory.storageFraction=0.2
Log information is as follows：
17/02/28 11:07:34 INFO memory.UnifiedMemoryManager: 
storageMemoryPool.memoryFree 0.000000
17/02/28 11:07:34 INFO memory.UnifiedMemoryManager: 
onHeapExecutionMemoryPool.memoryFree 5663.325877
17/02/28 11:07:34 INFO memory.UnifiedMemoryManager: 
storageMemoryPool.memoryUsed 0.299123 M
17/02/28 11:07:34 INFO memory.UnifiedMemoryManager: 
onHeapExecutionMemoryPool.memoryUsed 0.000000
17/02/28 11:07:34 INFO memory.UnifiedMemoryManager: storageMemoryPool.poolSize 
0.299123
17/02/28 11:07:34 INFO memory.UnifiedMemoryManager: 
onHeapExecutionMemoryPool.poolSize 5663.325877
According to the configuration, storageMemoryPool.poolSize at least 1G or more, 
but the log information is only 0.299123 M, so there is an error.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-20237) Spark-1.6 current and later versions of memory management issues

Reply via email to