[ 
https://issues.apache.org/jira/browse/SPARK-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491201#comment-14491201
 ] 

Sean Owen commented on SPARK-6864:
----------------------------------

This is missing some key info, like, what runs out of memory -- driver, 
executor? As you'd expect, it's highly unlikely to be the executor. How are you 
running this? are you sure the daemons have the memory you think?

> Spark's Multilabel Classifier runs out of memory on small datasets
> ------------------------------------------------------------------
>
>                 Key: SPARK-6864
>                 URL: https://issues.apache.org/jira/browse/SPARK-6864
>             Project: Spark
>          Issue Type: Test
>          Components: MLlib
>    Affects Versions: 1.2.1
>         Environment: EC2 with 8-96 instances up to r3.4xlarge
> The test fails on every configuration
>            Reporter: John Canny
>             Fix For: 1.2.1
>
>
> When trying to run Spark's MultiLabel classifier 
> (LogisticRegressionWithLBFGS) on the RCV1 V2 dataset (about 0.5GB, 100 
> labels), the classifier runs out of memory. The number of tasks per executor 
> doesnt seem to matter. It happens even with a single task per 120 GB 
> executor. The dataset is the concatenation of the test files from the "rcv1v2 
> (topics; full sets)" group here:
> http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel.html
> Here's the code:
> import org.apache.spark.SparkContext
> import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
> import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
> import org.apache.spark.mllib.optimization.L1Updater
> import org.apache.spark.mllib.regression.LabeledPoint
> import org.apache.spark.mllib.linalg.Vectors
> import org.apache.spark.mllib.util.MLUtils
> import scala.compat.Platform._ 
> val nnodes = 8
> val t0=currentTime
> // Load training data in LIBSVM format.
> val train = MLUtils.loadLibSVMFile(sc, "s3n://bidmach/RCV1train.libsvm", 
> true, 276544, nnodes)
> val test = MLUtils.loadLibSVMFile(sc, "s3n://bidmach/RCV1test.libsvm", true, 
> 276544, nnodes)
> val t1=currentTime;
> val lrAlg = new LogisticRegressionWithLBFGS()
> lrAlg.setNumClasses(100).optimizer.
>   setNumIterations(10).
>   setRegParam(1e-10).
>   setUpdater(new L1Updater)
> // Run training algorithm to build the model
> val model = lrAlg.run(train)
> val t2=currentTime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to