[jira] [Comment Edited] (SPARK-15904) High Memory Pressure using MLlib K-means

Alessio (JIRA) Mon, 13 Jun 2016 06:50:37 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15327397#comment-15327397
 ]


Alessio edited comment on SPARK-15904 at 6/13/16 1:49 PM:
----------------------------------------------------------

Dear [~srowen], 
at the beginning I noticed that "Cleaning RDD” phase (as in the original post) 
took a lot of time (10~15 minutes).
So I was curious and I opened the Activity Monitor on Mac OS X. That’s when I 
noticed the Memory Pressure indicator going crazy. The swap memory increases up 
to 10GB (when K=9120). And after this Cleaning RDD stage…everything’s back to 
normal. Swap memory will be reduced to 1GB or 2GBs. No more memory pressure and 
ready for the next K.
Moreover, Spark does not stop the execution. I do not receive any 
“Out-of-memory” errors from either Java, Python or Spark.

Have a look at the screenshot here (http://postimg.org/image/l4pc0vlzr/). 
K-means just finished another run for K=6000. See the memory stat, all of these 
peaks under the Last 24 Hours sections are from Spark, after every K-Means run.
After a couple of minutes, here's the screenshot 
(http://postimg.org/image/qc7re8clt/). The memory pressure indicator is going 
down, but Swap size is 10GB. If I wait a few more minutes, everything will be 
back to normal.


was (Author: purple):
Dear [~srowen], 
at the beginning I noticed that "Cleaning RDD” phase (as in the original post) 
took a lot of time (10~15 minutes).
So I was curious and I opened the Activity Monitor on Mac OS X. That’s when I 
noticed the Memory Pressure indicator going crazy. The swap memory increases up 
to 10GB (when K=9120). And after this Cleaning RDD stage…everything’s back to 
normal. Swap memory will be reduced to 1GB or 2GBs. No more memory pressure and 
ready for the next K.
Moreover, Spark does not stop the execution. I do not receive any 
“Out-of-memory” errors from either Java, Python or Spark.

Have a look at the screenshot here (http://postimg.org/image/l4pc0vlzr/). 
K-means just finished another run for K=6000. See the memory stat, all of these 
peaks under the Last 24 Hours sections are from Spark, after every K-Means run.

> High Memory Pressure using MLlib K-means
> ----------------------------------------
>
>                 Key: SPARK-15904
>                 URL: https://issues.apache.org/jira/browse/SPARK-15904
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.6.1
>         Environment: Mac OS X 10.11.6beta on Macbook Pro 13" mid-2012. 16GB 
> of RAM.
>            Reporter: Alessio
>            Priority: Minor
>
> Running MLlib K-Means on a ~400MB dataset (12 partitions), persisted on 
> Memory and Disk.
> Everything's fine, although at the end of K-Means, after the number of 
> iterations, the cost function value and the running time there's a nice 
> "Removing RDD <idx> from persistent list" stage. However, during this stage 
> there's a high memory pressure. Weird, since RDDs are about to be removed. 
> Full log of this stage:
> 16/06/12 20:37:33 INFO clustering.KMeans: Run 0 finished in 14 iterations
> 16/06/12 20:37:33 INFO clustering.KMeans: Iterations took 694.544 seconds.
> 16/06/12 20:37:33 INFO clustering.KMeans: KMeans converged in 14 iterations.
> 16/06/12 20:37:33 INFO clustering.KMeans: The cost for the best run is 
> 49784.87126751288.
> 16/06/12 20:37:33 INFO rdd.MapPartitionsRDD: Removing RDD 781 from 
> persistence list
> 16/06/12 20:37:33 INFO storage.BlockManager: Removing RDD 781
> 16/06/12 20:37:33 INFO rdd.MapPartitionsRDD: Removing RDD 780 from 
> persistence list
> 16/06/12 20:37:33 INFO storage.BlockManager: Removing RDD 780
> I'm running this K-Means on a 16GB machine, with Spark Context as local[*]. 
> My machine has an i5 hyperthreaded dual-core, thus [*] means 4.
> I'm launching this application though spark-submit with --driver-memory 9G



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-15904) High Memory Pressure using MLlib K-means

Reply via email to