[spark] branch branch-3.2 updated: [SPARK-36578][ML] UnivariateFeatureSelector API doc improvement

dongjoon Thu, 26 Aug 2021 21:18:14 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.2 by this push:
     new 786d773  [SPARK-36578][ML] UnivariateFeatureSelector API doc 
improvement
786d773 is described below

commit 786d773585a6c89bff5ec9c7c73940d0997474bc
Author: Huaxin Gao <[email protected]>
AuthorDate: Thu Aug 26 21:16:49 2021 -0700

    [SPARK-36578][ML] UnivariateFeatureSelector API doc improvement
    
    ### What changes were proposed in this pull request?
    Change API doc for `UnivariateFeatureSelector`
    
    ### Why are the changes needed?
    make the doc look better
    
    ### Does this PR introduce _any_ user-facing change?
    yes, API doc change
    
    ### How was this patch tested?
    Manually checked
    
    Closes #33855 from huaxingao/ml_doc.
    
    Authored-by: Huaxin Gao <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
    (cherry picked from commit 15e42b44423942be75a68993b3e34696ef2b21f6)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../org/apache/spark/ml/feature/UnivariateFeatureSelector.scala  | 9 ++++++---
 python/pyspark/ml/feature.py                                     | 8 +++++---
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/feature/UnivariateFeatureSelector.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/feature/UnivariateFeatureSelector.scala
index 7fff159..7412c42 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/feature/UnivariateFeatureSelector.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/feature/UnivariateFeatureSelector.scala
@@ -97,12 +97,15 @@ private[feature] trait UnivariateFeatureSelectorParams 
extends Params
 }
 
 /**
- * The user can set `featureType` and labelType`, and Spark will pick the 
score function based on
- * the specified `featureType` and labelType`.
+ * Feature selector based on univariate statistical tests against labels. 
Currently, Spark
+ * supports three Univariate Feature Selectors: chi-squared, ANOVA F-test and 
F-value.
+ * User can choose Univariate Feature Selector by setting `featureType` and 
`labelType`,
+ * and Spark will pick the score function based on the specified `featureType` 
and `labelType`.
+ *
  * The following combination of `featureType` and `labelType` are supported:
  *  - `featureType` `categorical` and `labelType` `categorical`: Spark uses 
chi-squared,
  *    i.e. chi2 in sklearn.
- *  - `featureType` `continuous` and `labelType` `categorical`: Spark uses 
ANOVATest,
+ *  - `featureType` `continuous` and `labelType` `categorical`: Spark uses 
ANOVA F-test,
  *    i.e. f_classif in sklearn.
  *  - `featureType` `continuous` and `labelType` `continuous`: Spark uses 
F-value,
  *    i.e. f_regression in sklearn.
diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py
index e066788..cf6b91c 100755
--- a/python/pyspark/ml/feature.py
+++ b/python/pyspark/ml/feature.py
@@ -5816,14 +5816,16 @@ class UnivariateFeatureSelector(JavaEstimator, 
_UnivariateFeatureSelectorParams,
                                 JavaMLWritable):
     """
     UnivariateFeatureSelector
-    The user can set `featureType` and `labelType`, and Spark will pick the 
score function based on
-    the specified `featureType` and `labelType`.
+    Feature selector based on univariate statistical tests against labels. 
Currently, Spark
+    supports three Univariate Feature Selectors: chi-squared, ANOVA F-test and 
F-value.
+    User can choose Univariate Feature Selector by setting `featureType` and 
`labelType`,
+    and Spark will pick the score function based on the specified 
`featureType` and `labelType`.
 
     The following combination of `featureType` and `labelType` are supported:
 
     - `featureType` `categorical` and `labelType` `categorical`, Spark uses 
chi-squared,
       i.e. chi2 in sklearn.
-    - `featureType` `continuous` and `labelType` `categorical`, Spark uses 
ANOVATest,
+    - `featureType` `continuous` and `labelType` `categorical`, Spark uses 
ANOVA F-test,
       i.e. f_classif in sklearn.
     - `featureType` `continuous` and `labelType` `continuous`, Spark uses 
F-value,
       i.e. f_regression in sklearn.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.2 updated: [SPARK-36578][ML] UnivariateFeatureSelector API doc improvement

Reply via email to