spark git commit: [SPARK-23112][DOC] Update ML migration guide with breaking and behavior changes.

mlnick Wed, 31 Jan 2018 00:38:08 -0800

Repository: spark
Updated Branches:
  refs/heads/master 695f7146b -> 161a3f2ae



[SPARK-23112][DOC] Update ML migration guide with breaking and behavior changes.

Add breaking changes, as well as update behavior changes, to `2.3` ML migration 
guide.

## How was this patch tested?

Doc only

Author: Nick Pentreath <[email protected]>

Closes #20421 from MLnick/SPARK-23112-ml-guide.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/161a3f2a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/161a3f2a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/161a3f2a

Branch: refs/heads/master
Commit: 161a3f2ae324271a601500e3d2900db9359ee2ef
Parents: 695f714
Author: Nick Pentreath <[email protected]>
Authored: Wed Jan 31 10:37:37 2018 +0200
Committer: Nick Pentreath <[email protected]>
Committed: Wed Jan 31 10:37:37 2018 +0200

----------------------------------------------------------------------
 docs/ml-guide.md | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/161a3f2a/docs/ml-guide.md
----------------------------------------------------------------------
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index b957445..702bcf7 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -108,7 +108,13 @@ and the migration guide below will explain all changes 
between releases.
 
 ### Breaking changes
 
-There are no breaking changes.
+* The class and trait hierarchy for logistic regression model summaries was 
changed to be cleaner
+and better accommodate the addition of the multi-class summary. This is a 
breaking change for user
+code that casts a `LogisticRegressionTrainingSummary` to a
+` BinaryLogisticRegressionTrainingSummary`. Users should instead use the 
`model.binarySummary`
+method. See [SPARK-17139](https://issues.apache.org/jira/browse/SPARK-17139) 
for more detail 
+(_note_ this is an `Experimental` API). This _does not_ affect the Python 
`summary` method, which
+will still work correctly for both multinomial and binary cases.
 
 ### Deprecations and changes of behavior
 
@@ -123,8 +129,19 @@ new 
[`OneHotEncoderEstimator`](ml-features.html#onehotencoderestimator)
 **Changes of behavior**
 
 * [SPARK-21027](https://issues.apache.org/jira/browse/SPARK-21027):
- We are now setting the default parallelism used in `OneVsRest` to be 1 (i.e. 
serial). In 2.2 and
+ The default parallelism used in `OneVsRest` is now set to 1 (i.e. serial). In 
`2.2` and
  earlier versions, the level of parallelism was set to the default threadpool 
size in Scala.
+* [SPARK-22156](https://issues.apache.org/jira/browse/SPARK-22156):
+ The learning rate update for `Word2Vec` was incorrect when `numIterations` 
was set greater than
+ `1`. This will cause training results to be different between `2.3` and 
earlier versions.
+* [SPARK-21681](https://issues.apache.org/jira/browse/SPARK-21681):
+ Fixed an edge case bug in multinomial logistic regression that resulted in 
incorrect coefficients
+ when some features had zero variance.
+* [SPARK-16957](https://issues.apache.org/jira/browse/SPARK-16957):
+ Tree algorithms now use mid-points for split values. This may change results 
from model training.
+* [SPARK-14657](https://issues.apache.org/jira/browse/SPARK-14657):
+ Fixed an issue where the features generated by `RFormula` without an 
intercept were inconsistent
+ with the output in R. This may change results from model training in this 
scenario.
   
 ## Previous Spark versions
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-23112][DOC] Update ML migration guide with breaking and behavior changes.

Reply via email to