[jira] [Resolved] (SPARK-21643) LR dataset worked in Spark 1.6.3, 2.0.2 stopped working in 2.1.0 onward

Sean Owen (JIRA) Sat, 05 Aug 2017 02:10:00 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-21643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-21643.
-------------------------------
    Resolution: Invalid

This isn't narrowed down nearly enough to be a JIRA. It's not even clear 
there's a problem as you just get a different number of iterations.

> LR dataset worked in Spark 1.6.3, 2.0.2 stopped working in 2.1.0 onward
> -----------------------------------------------------------------------
>
>                 Key: SPARK-21643
>                 URL: https://issues.apache.org/jira/browse/SPARK-21643
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.1.0, 2.1.1, 2.2.0
>         Environment: CentOS 7, 256G memory, and 52 CPUs VM
>            Reporter: Thomas Kwan
>
> This dataset is working on 1.6.x, and 2.0.x. But it is not converging with 
> 2.1+
> a) Download the data set 
> (https://s3.amazonaws.com/manage-partners/pipeline/di873-train.json.gz) and 
> uncompress it, i placed it /tmp/di873-train.json
> b) Download the spark package to /usr/lib/spark/spark-*
> c) cd sbin
> d) start-master.sh
> e) start-slave.sh <master-url>
> f) cd ../bin
> g) Start spark-shell <master-url>
> h) I pasted in the following scala cods:
> import org.apache.spark.sql.types._
> val VT = org.apache.spark.ml.linalg.SQLDataTypes.VectorType
> val schema = StructType(Array(StructField("features", 
> VT,true),StructField("label",DoubleType,true)))
> val df = spark.read.schema(schema).json("file:///tmp/di873-train.json")
> val trainer = new 
> org.apache.spark.ml.classification.LogisticRegression().setMaxIter(500).setElasticNetParam(1.0).setRegParam(0.00001).setTol(0.00001).setFitIntercept(true)
> val model = trainer.fit(df)
> i) Then I monitored the progress in the Spark UI under the Jobs tab.
> With Spark 1.6.1, Spark 2.0.2, the training (treeAggregate), the training 
> finished around 25-30 jobs. But with 2.1+, the trainings were not converging 
> and the training were finished only because they hitted the max iterations 
> (i.e. 500).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-21643) LR dataset worked in Spark 1.6.3, 2.0.2 stopped working in 2.1.0 onward

Reply via email to