[jira] [Updated] (SPARK-13448) Document MLlib behavior changes in Spark 2.0

Nick Pentreath (JIRA) Tue, 03 May 2016 08:10:33 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Nick Pentreath updated SPARK-13448:
-----------------------------------
    Description: 
This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can 
remember to add them to the migration guide / release notes.

* SPARK-13429: change convergenceTol in LogisticRegressionWithLBFGS from 1e-4 
to 1e-6.
* SPARK-7780: Intercept will not be regularized if users train binary 
classification model with L1/L2 Updater by LogisticRegressionWithLBFGS, because 
it calls ML LogisticRegresson implementation. Meanwhile if users set without 
regularization, training with or without feature scaling will return the same 
solution by the same convergence rate(because they run the same code route), 
this behavior is different from the old API.
* SPARK-12363: Bug fix for PowerIterationClustering which will likely change 
results
* SPARK-13048: LDA using the EM optimizer will keep the last checkpoint by 
default, if checkpointing is being used.
* SPARK-12153: Word2Vec now respects sentence boundaries.  Previously, it did 
not handle them correctly.
* SPARK-10574: HashingTF uses MurmurHash3 by default in both spark.ml and 
spark.mllib
* SPARK-14768: Remove expectedType arg for PySpark Param
* SPARK-14931: Mismatched default Param values between pipelines in Spark and 
PySpark
* SPARK-13600: Use approxQuantile from DataFrame stats in QuantileDiscretizer

  was:
This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can 
remember to add them to the migration guide / release notes.

* SPARK-13429: change convergenceTol in LogisticRegressionWithLBFGS from 1e-4 
to 1e-6.
* SPARK-7780: Intercept will not be regularized if users train binary 
classification model with L1/L2 Updater by LogisticRegressionWithLBFGS, because 
it calls ML LogisticRegresson implementation. Meanwhile if users set without 
regularization, training with or without feature scaling will return the same 
solution by the same convergence rate(because they run the same code route), 
this behavior is different from the old API.
* SPARK-12363: Bug fix for PowerIterationClustering which will likely change 
results
* SPARK-13048: LDA using the EM optimizer will keep the last checkpoint by 
default, if checkpointing is being used.
* SPARK-12153: Word2Vec now respects sentence boundaries.  Previously, it did 
not handle them correctly.
* SPARK-10574: HashingTF uses MurmurHash3 by default in both spark.ml and 
spark.mllib
* SPARK-14768: Remove expectedType arg for PySpark Param
* SPARK-14931: Mismatched default Param values between pipelines in Spark and 
PySpark


> Document MLlib behavior changes in Spark 2.0
> --------------------------------------------
>
>                 Key: SPARK-13448
>                 URL: https://issues.apache.org/jira/browse/SPARK-13448
>             Project: Spark
>          Issue Type: Documentation
>          Components: ML, MLlib
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>
> This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can 
> remember to add them to the migration guide / release notes.
> * SPARK-13429: change convergenceTol in LogisticRegressionWithLBFGS from 1e-4 
> to 1e-6.
> * SPARK-7780: Intercept will not be regularized if users train binary 
> classification model with L1/L2 Updater by LogisticRegressionWithLBFGS, 
> because it calls ML LogisticRegresson implementation. Meanwhile if users set 
> without regularization, training with or without feature scaling will return 
> the same solution by the same convergence rate(because they run the same code 
> route), this behavior is different from the old API.
> * SPARK-12363: Bug fix for PowerIterationClustering which will likely change 
> results
> * SPARK-13048: LDA using the EM optimizer will keep the last checkpoint by 
> default, if checkpointing is being used.
> * SPARK-12153: Word2Vec now respects sentence boundaries.  Previously, it did 
> not handle them correctly.
> * SPARK-10574: HashingTF uses MurmurHash3 by default in both spark.ml and 
> spark.mllib
> * SPARK-14768: Remove expectedType arg for PySpark Param
> * SPARK-14931: Mismatched default Param values between pipelines in Spark and 
> PySpark
> * SPARK-13600: Use approxQuantile from DataFrame stats in QuantileDiscretizer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-13448) Document MLlib behavior changes in Spark 2.0

Reply via email to