[
https://issues.apache.org/jira/browse/SPARK-30144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16991050#comment-16991050
]
zhengruifeng commented on SPARK-30144:
--------------------------------------
[~huaxingao] It seems like that MultilayerPerceptronClassificationModel should
extend MultilayerPerceptronParams to expose the training params.
> MLP param map missing
> ---------------------
>
> Key: SPARK-30144
> URL: https://issues.apache.org/jira/browse/SPARK-30144
> Project: Spark
> Issue Type: Bug
> Components: MLlib
> Affects Versions: 2.4.4
> Reporter: Glen-Erik Cortes
> Priority: Minor
> Attachments: MLP_params_missing.ipynb,
> data_banknote_authentication.csv
>
>
> Param maps for fitted classifiers are available with all classifiers except
> for the MultilayerPerceptronClassifier.
>
> There is no way to track or know what parameters were best during a
> crossvalidation or which parameters were used for submodels.
>
> {code:java}
> {
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1',
> name='featuresCol', doc='features column name'): 'features',
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='labelCol',
> doc='label column name'): 'fake_banknote',
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1',
> name='predictionCol', doc='prediction column name'): 'prediction',
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1',
> name='probabilityCol', doc='Column name for predicted class conditional
> probabilities. Note: Not all models output well-calibrated probability
> estimates! These probabilities should be treated as confidences, not precise
> probabilities'): 'probability',
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1',
> name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column
> name'): 'rawPrediction'}{code}
>
> GBTClassifier for example shows all parameters:
>
> {code:java}
> {
> Param(parent='GBTClassifier_a0e77b3430aa', name='cacheNodeIds', doc='If
> false, the algorithm will pass trees to executors to match instances with
> nodes. If true, the algorithm will cache node IDs for each instance. Caching
> can speed up training of deeper trees.'): False,
> Param(parent='GBTClassifier_a0e77b3430aa', name='checkpointInterval',
> doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means
> that the cache will get checkpointed every 10 iterations. Note: this setting
> will be ignored if the checkpoint directory is not set in the SparkContext'):
> 10,
> Param(parent='GBTClassifier_a0e77b3430aa', name='featureSubsetStrategy',
> doc='The number of features to consider for splits at each tree node.
> Supported options: auto, all, onethird, sqrt, log2, (0.0-1.0], [1-n].'):
> 'all',
> Param(parent='GBTClassifier_a0e77b3430aa', name='featuresCol', doc='features
> column name'): 'features',
> Param(parent='GBTClassifier_a0e77b3430aa', name='labelCol', doc='label column
> name'): 'fake_banknote', Param(parent='GBTClassifier_a0e77b3430aa',
> name='lossType', doc='Loss function which GBT tries to minimize
> (case-insensitive). Supported options: logistic'): 'logistic',
> Param(parent='GBTClassifier_a0e77b3430aa', name='maxBins', doc='Max number of
> bins for discretizing continuous features. Must be >=2 and >= number of
> categories for any categorical feature.'): 8,
> Param(parent='GBTClassifier_a0e77b3430aa', name='maxDepth', doc='Maximum
> depth of the tree. (>= 0) E.g., depth 0 means 1 leaf node; depth 1 means 1
> internal node + 2 leaf nodes.'): 5,
> Param(parent='GBTClassifier_a0e77b3430aa', name='maxIter', doc='maximum
> number of iterations (>= 0)'): 20,
> Param(parent='GBTClassifier_a0e77b3430aa', name='maxMemoryInMB', doc='Maximum
> memory in MB allocated to histogram aggregation.'): 256,
> Param(parent='GBTClassifier_a0e77b3430aa', name='minInfoGain', doc='Minimum
> information gain for a split to be considered at a tree node.'): 0.0,
> Param(parent='GBTClassifier_a0e77b3430aa', name='minInstancesPerNode',
> doc='Minimum number of instances each child must have after split. If a split
> causes the left or right child to have fewer than minInstancesPerNode, the
> split will be discarded as invalid. Should be >= 1.'): 1,
> Param(parent='GBTClassifier_a0e77b3430aa', name='predictionCol',
> doc='prediction column name'): 'prediction',
> Param(parent='GBTClassifier_a0e77b3430aa', name='seed', doc='random seed'):
> 1234,
> Param(parent='GBTClassifier_a0e77b3430aa', name='stepSize', doc='Step size
> (a.k.a. learning rate) in interval (0, 1] for shrinking the contribution of
> each estimator.'): 0.1,
> Param(parent='GBTClassifier_a0e77b3430aa', name='subsamplingRate',
> doc='Fraction of the training data used for learning each decision tree, in
> range (0, 1].'): 1.0}{code}
>
> See attached ipynb or example notebook here:
> [https://colab.research.google.com/drive/1lwSHioZKlLh96FhGkdYFe6FUuRfTcSxH]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]