[ https://issues.apache.org/jira/browse/SPARK-21643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen closed SPARK-21643. ----------------------------- > LR dataset worked in Spark 1.6.3, 2.0.2 stopped working in 2.1.0 onward > ----------------------------------------------------------------------- > > Key: SPARK-21643 > URL: https://issues.apache.org/jira/browse/SPARK-21643 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.1.0, 2.1.1, 2.2.0 > Environment: CentOS 7, 256G memory, and 52 CPUs VM > Reporter: Thomas Kwan > > This dataset is working on 1.6.x, and 2.0.x. But it is not converging with > 2.1+ > a) Download the data set > (https://s3.amazonaws.com/manage-partners/pipeline/di873-train.json.gz) and > uncompress it, i placed it /tmp/di873-train.json > b) Download the spark package to /usr/lib/spark/spark-* > c) cd sbin > d) start-master.sh > e) start-slave.sh <master-url> > f) cd ../bin > g) Start spark-shell <master-url> > h) I pasted in the following scala cods: > import org.apache.spark.sql.types._ > val VT = org.apache.spark.ml.linalg.SQLDataTypes.VectorType > val schema = StructType(Array(StructField("features", > VT,true),StructField("label",DoubleType,true))) > val df = spark.read.schema(schema).json("file:///tmp/di873-train.json") > val trainer = new > org.apache.spark.ml.classification.LogisticRegression().setMaxIter(500).setElasticNetParam(1.0).setRegParam(0.00001).setTol(0.00001).setFitIntercept(true) > val model = trainer.fit(df) > i) Then I monitored the progress in the Spark UI under the Jobs tab. > With Spark 1.6.1, Spark 2.0.2, the training (treeAggregate), the training > finished around 25-30 jobs. But with 2.1+, the trainings were not converging > and the training were finished only because they hitted the max iterations > (i.e. 500). -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org