[
https://issues.apache.org/jira/browse/SPARK-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bryan Cutler updated SPARK-11219:
---------------------------------
Description:
There are several different formats for describing params in PySpark.MLlib,
making it unclear what the preferred way to document is, i.e. vertical
alignment vs single line.
This is to agree on a format and make it consistent across PySpark.MLlib.
Following the discussion in SPARK-10560, using 2 lines with an indentation is
both readable and doesn't lead to changing many lines when adding/removing
parameters. If the parameter uses a default value, put this in parenthesis in
a new line under the description.
Example:
{noformat}
:param stepSize:
Step size for each iteration of gradient descent.
(default: 0.1)
:param numIterations:
Number of iterations run for each batch of data.
(default: 50)
{noformat}
h2. Current State of Parameter Description Formating
h4. Classification
* LogisticRegressionModel - single line descriptions, fix indentations
* LogisticRegressionWithSGD - vertical alignment, sporatic default values
* LogisticRegressionWithLBFGS - vertical alignment, sporatic default values
* SVMModel - single line
* SVMWithSGD - vertical alignment, sporatic default values
* NaiveBayesModel - single line
* NaiveBayes - single line
h4. Clustering
* KMeansModel - missing param description
* KMeans - missing param description and defaults
* GaussianMixture - vertical align, incorrect default formatting
* PowerIterationClustering - single line with wrapped indentation, missing
defaults
* StreamingKMeansModel - single line wrapped
* StreamingKMeans - single line wrapped, missing defaults
* LDAModel - single line
* LDA - vertical align, mising some defaults
h4. FPM
* FPGrowth - single line
* PrefixSpan - single line, defaults values in backticks
h4. Recommendation
* ALS - does not have param descriptions
h4. Regression
* LabeledPoint - single line
* LinearModel - single line
* LinearRegressionWithSGD - vertical alignment
* RidgeRegressionWithSGD - vertical align
* IsotonicRegressionModel - single line
* IsotonicRegression - single line, missing default
h4. Tree
* DecisionTree - single line with vertical indentation, missing defaults
* RandomForest - single line with wrapped indent, missing some defaults
* GradientBoostedTrees - single line with wrapped indent
NOTE
This issue will just focus on model/algorithm descriptions, which are the
largest source of inconsistent formatting
evaluation.py, feature.py, random.py, utils.py - these supporting classes have
param descriptions as single line, but are consistent so don't need to be
changed
was:
There are several different formats for describing params in PySpark.MLlib,
making it unclear what the preferred way to document is, i.e. vertical
alignment vs single line.
This is to agree on a format and make it consistent across PySpark.MLlib.
Following the discussion in SPARK-10560, using 2 lines with an indentation is
both readable and doesn't lead to changing many lines when adding/removing
parameters. If the parameter uses a default value, put this in parenthesis in
a new line under the description.
Example:
{noformat}
:param stepSize:
Step size for each iteration of gradient descent.
(default: 0.1)
:param numIterations:
Number of iterations run for each batch of data.
(default: 50)
{noformat}
> Make Parameter Description Format Consistent in PySpark.MLlib
> -------------------------------------------------------------
>
> Key: SPARK-11219
> URL: https://issues.apache.org/jira/browse/SPARK-11219
> Project: Spark
> Issue Type: Documentation
> Components: Documentation, MLlib, PySpark
> Reporter: Bryan Cutler
> Priority: Trivial
>
> There are several different formats for describing params in PySpark.MLlib,
> making it unclear what the preferred way to document is, i.e. vertical
> alignment vs single line.
> This is to agree on a format and make it consistent across PySpark.MLlib.
> Following the discussion in SPARK-10560, using 2 lines with an indentation is
> both readable and doesn't lead to changing many lines when adding/removing
> parameters. If the parameter uses a default value, put this in parenthesis
> in a new line under the description.
> Example:
> {noformat}
> :param stepSize:
> Step size for each iteration of gradient descent.
> (default: 0.1)
> :param numIterations:
> Number of iterations run for each batch of data.
> (default: 50)
> {noformat}
> h2. Current State of Parameter Description Formating
> h4. Classification
> * LogisticRegressionModel - single line descriptions, fix indentations
> * LogisticRegressionWithSGD - vertical alignment, sporatic default values
> * LogisticRegressionWithLBFGS - vertical alignment, sporatic default values
> * SVMModel - single line
> * SVMWithSGD - vertical alignment, sporatic default values
> * NaiveBayesModel - single line
> * NaiveBayes - single line
> h4. Clustering
> * KMeansModel - missing param description
> * KMeans - missing param description and defaults
> * GaussianMixture - vertical align, incorrect default formatting
> * PowerIterationClustering - single line with wrapped indentation, missing
> defaults
> * StreamingKMeansModel - single line wrapped
> * StreamingKMeans - single line wrapped, missing defaults
> * LDAModel - single line
> * LDA - vertical align, mising some defaults
> h4. FPM
> * FPGrowth - single line
> * PrefixSpan - single line, defaults values in backticks
> h4. Recommendation
> * ALS - does not have param descriptions
> h4. Regression
> * LabeledPoint - single line
> * LinearModel - single line
> * LinearRegressionWithSGD - vertical alignment
> * RidgeRegressionWithSGD - vertical align
> * IsotonicRegressionModel - single line
> * IsotonicRegression - single line, missing default
> h4. Tree
> * DecisionTree - single line with vertical indentation, missing defaults
> * RandomForest - single line with wrapped indent, missing some defaults
> * GradientBoostedTrees - single line with wrapped indent
> NOTE
> This issue will just focus on model/algorithm descriptions, which are the
> largest source of inconsistent formatting
> evaluation.py, feature.py, random.py, utils.py - these supporting classes
> have param descriptions as single line, but are consistent so don't need to
> be changed
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]