[
https://issues.apache.org/jira/browse/SPARK-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086973#comment-16086973
]
Kannu Gupta commented on SPARK-15423:
-------------------------------------
[~srowen] I am facing the same issue with spark-2.1.0. Is the issue solved for
spark-2.1.0?
> why it is very slow to clean resources in Spark-2.0.0-preview
> -------------------------------------------------------------
>
> Key: SPARK-15423
> URL: https://issues.apache.org/jira/browse/SPARK-15423
> Project: Spark
> Issue Type: Question
> Components: Block Manager, MLlib
> Affects Versions: 2.0.0
> Environment: RedHat 6.5 (64 bit), JDK 1.8, Standalone mode
> Reporter: zszhong
> Labels: newbie, starter
>
> Hi, everyone! I'm new to Spark. Originally I submitted a post in
> [http://stackoverflow.com/questions/37331226/why-it-is-very-slow-to-clean-resources-in-spark],
> but somebody think that it is off-topic. Thus I post here to ask for your
> help. If this post is not related here, please feel free to delete it. I just
> copy the content here, I don't know how to edit the code to be more readable,
> so please refer to the link in stackoverflow.
> I've submitted a very simple task into a standalone Spark environment
> (`spark-2.0.0-preview`, `jdk 1.8`, `48 CPU cores`, `250 Gb memory`) with the
> following command:
> bin/spark-submit.sh --master spark://hostname.domain:7077 --conf
> "spark.executor.memory=8G" ../SimpleApp.py ../data/train/ ../data/val/
> where the `SimpleApp.py` is:
> from __future__ import print_function
> import sys
> import os
> import numpy as np
> from pyspark import SparkContext
> from pyspark.mllib.tree import RandomForest, RandomForestModel
> from pyspark.mllib.util import MLUtils
> trainDataPath = sys.argv[1]
> valDataPath = sys.argv[2]
> sc = SparkContext(appName="Classification using Spark Random Forest")
> trainData = MLUtils.loadLibSVMFile(sc, trainDataPath)
> valData = MLUtils.loadLibSVMFile(sc, valDataPath)
> model = RandomForest.trainClassifier(trainData, numClasses=6,
> categoricalFeaturesInfo={}, numTrees=3, featureSubsetStrategy="auto",
> impurity='gini', maxDepth=4, maxBins=32)
> predictions = model.predict(valData.map(lambda x: x.features))
> labelsAndPredictions = valData.map(lambda lp:
> lp.label).zip(predictions)
> testErr = labelsAndPredictions.filter(lambda (v, p): v != p).count()
> / float(valData.count())
> print('Test Error = ' + str(testErr))
> And the task is running OK and can output the `Test Error` as follows:
> Test Error = 0.380580779161
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_21_piece0 on
> 127.0.0.1:59714 in memory (size: 12.1 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_21_piece0 on
> 127.0.0.1:37978 in memory (size: 12.1 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_19_piece0 on
> 127.0.0.1:37978 in memory (size: 10.9 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_19_piece0 on
> 127.0.0.1:59714 in memory (size: 10.9 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_20_piece0 on
> 127.0.0.1:59714 in memory (size: 4.6 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_20_piece0 on
> 127.0.0.1:37978 in memory (size: 4.6 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_17_piece0 on
> 127.0.0.1:59714 in memory (size: 4.0 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_17_piece0 on
> 127.0.0.1:37978 in memory (size: 4.0 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_18_piece0 on
> 127.0.0.1:59714 in memory (size: 455.0 B, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_18_piece0 on
> 127.0.0.1:37978 in memory (size: 455.0 B, free: 4.5 GB)
> 16/05/20 01:04:52 INFO ContextCleaner: Cleaned shuffle 4
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_16_piece0 on
> 127.0.0.1:59714 in memory (size: 9.2 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_16_piece0 on
> 127.0.0.1:37978 in memory (size: 9.2 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_14_piece0 on
> 127.0.0.1:59714 in memory (size: 3.6 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_14_piece0 on
> 127.0.0.1:37978 in memory (size: 3.6 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_15_piece0 on
> 127.0.0.1:59714 in memory (size: 389.0 B, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_15_piece0 on
> 127.0.0.1:37978 in memory (size: 389.0 B, free: 4.5 GB)
> 16/05/20 01:04:52 INFO ContextCleaner: Cleaned shuffle 3
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_12_piece0 on
> 127.0.0.1:59714 in memory (size: 345.0 B, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_12_piece0 on
> 127.0.0.1:37978 in memory (size: 345.0 B, free: 4.5 GB)
> 16/05/20 01:04:52 INFO ContextCleaner: Cleaned shuffle 2
> 16/05/20 01:04:52 INFO BlockManager: Removing RDD 19
> 16/05/20 01:04:52 INFO ContextCleaner: Cleaned RDD 19
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_22_piece0 on
> 127.0.0.1:59714 in memory (size: 4.5 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_22_piece0 on
> 127.0.0.1:37978 in memory (size: 4.5 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManager: Removing RDD 10
> 16/05/20 01:04:52 INFO ContextCleaner: Cleaned RDD 10
> 16/05/20 01:20:01 INFO BlockManager: Removing RDD 2
> 16/05/20 01:20:01 INFO ContextCleaner: Cleaned RDD 2
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_0_piece0 on
> 127.0.0.1:59714 in memory (size: 14.3 KB, free: 511.1 MB)
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_0_piece0 on
> 127.0.0.1:37978 on disk (size: 14.3 KB)
> 16/05/20 01:20:01 INFO ContextCleaner: Cleaned accumulator 0
> 16/05/20 01:20:01 INFO BlockManager: Removing RDD 6
> 16/05/20 01:20:01 INFO ContextCleaner: Cleaned RDD 6
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
> 127.0.0.1:59714 in memory (size: 14.3 KB, free: 511.1 MB)
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
> 127.0.0.1:37978 on disk (size: 14.3 KB)
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
> 127.0.0.1:59714 in memory (size: 4.1 KB, free: 511.1 MB)
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
> 127.0.0.1:37978 on disk (size: 4.1 KB)
> But after that, the task is still running and have no any signals to be
> exited. In the picture, it shows the task outputs the `Test Error` at
> `01:04:52`, and after more than an hour (I submitted the task at `00:50:00`),
> the job is still running. It is expected that the job should exit within a
> reasonable time.
> The job is still running after I submit this post (Now it is still running
> without any failed information). In Spark Master UI, it shows the job have
> been running 6.8 hours since I submitted (From 00:50:00 to Now).
> Why is the cleaning procedure is so slow? Is there any related configuration
> that I missed?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]