[ 
https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332698#comment-16332698
 ] 

Bryan Cutler edited comment on SPARK-23109 at 1/29/18 5:26 PM:
---------------------------------------------------------------

I did the following: generated HTML doc and checked for consistency with Scala, 
 did not see any API breaking changes, checked for missing items (see list 
below), checked default param values match.  No blocking or major issues found.

Items requiring follow up, I will create (related) JIRAS to fix:

classification:
     GBTClassifier - missing featureSubsetStrategy, should be moved to 
TreeEnsembleParams
     GBTClassificationModel - missing numClasses, should inherit from 
JavaClassificationModel
 for both of the above SPARK-23161

clustering:
     GuassianMixtureModel - missing guassians, need to serialize 
Array[MultivariateGaussian]?
     LDAModel - missing topicsMatrix - can send Matrix through Py4J?

evaluation:
     ClusteringEvaluator - DOC describe silhouette like scaladoc

feature:
     Bucketizer - mulitple input/output cols, splitsArray - SPARK-22797
     ChiSqSelector - DOC selectorType desc missing new types
     QuantileDiscretizer - multiple input output cols - SPARK-22796

fpm:
     DOC associationRules should say return "DataFrame"

image:
     missing columnSchema, get*, scala missing toNDArray - SPARK-23256

regression:
     LinearRegressionSummary - missing r2adj - SPARK-23162

stat:
     missing Summarizer class - SPARK-21741

tuning:
     missing subModels, hasSubModels - SPARK-22005

for the above DOC issues SPARK-23163


was (Author: bryanc):
I did the following: generated HTML doc and checked for consistency with Scala, 
 did not see any API breaking changes, checked for missing items (see list 
below), checked default param values match.  No blocking or major issues found.

Items requiring follow up, I will create (related) JIRAS to fix:

classification:
     GBTClassifier - missing featureSubsetStrategy, should be moved to 
TreeEnsembleParams
     GBTClassificationModel - missing numClasses, should inherit from 
JavaClassificationModel
 for both of the above https://issues.apache.org/jira/browse/SPARK-23161

clustering:
     GuassianMixtureModel - missing guassians, need to serialize 
Array[MultivariateGaussian]?
     LDAModel - missing topicsMatrix - can send Matrix through Py4J?

evaluation:
     ClusteringEvaluator - DOC describe silhouette like scaladoc

feature:
     Bucketizer - mulitple input/output cols, splitsArray - 
https://issues.apache.org/jira/browse/SPARK-22797
     ChiSqSelector - DOC selectorType desc missing new types
     QuantileDiscretizer - multiple input output cols - 
https://issues.apache.org/jira/browse/SPARK-22796

fpm:
     DOC associationRules should say return "DataFrame"

image:
     missing columnSchema, get*, scala missing toNDArray - SPARK-23256

regression:
     LinearRegressionSummary - missing r2adj - 
https://issues.apache.org/jira/browse/SPARK-23162

stat:
     missing Summarizer class - 
https://issues.apache.org/jira/browse/SPARK-21741

tuning:
     missing subModels, hasSubModels - 
https://issues.apache.org/jira/browse/SPARK-22005

for the above DOC issues https://issues.apache.org/jira/browse/SPARK-23163

> ML 2.3 QA: API: Python API coverage
> -----------------------------------
>
>                 Key: SPARK-23109
>                 URL: https://issues.apache.org/jira/browse/SPARK-23109
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Documentation, ML, PySpark
>    Affects Versions: 2.3.0
>            Reporter: Joseph K. Bradley
>            Assignee: Bryan Cutler
>            Priority: Blocker
>
> For new public APIs added to MLlib ({{spark.ml}} only), we need to check the 
> generated HTML doc and compare the Scala & Python versions.
> * *GOAL*: Audit and create JIRAs to fix in the next release.
> * *NON-GOAL*: This JIRA is _not_ for fixing the API parity issues.
> We need to track:
> * Inconsistency: Do class/method/parameter names match?
> * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
> be as complete as the Scala doc.
> * API breaking changes: These should be very rare but are occasionally either 
> necessary (intentional) or accidental.  These must be recorded and added in 
> the Migration Guide for this release.
> ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
> component, please note that as well.
> * Missing classes/methods/parameters: We should create to-do JIRAs for 
> functionality missing from Python, to be added in the next release cycle.  
> *Please use a _separate_ JIRA (linked below as "requires") for this list of 
> to-do items.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to