Glen-Erik Cortes created SPARK-30144:
----------------------------------------
Summary: MLP param map missing
Key: SPARK-30144
URL: https://issues.apache.org/jira/browse/SPARK-30144
Project: Spark
Issue Type: Bug
Components: MLlib
Affects Versions: 2.4.4
Reporter: Glen-Erik Cortes
Param maps for fitted classifiers are available with all classifiers except for
the
MultilayerPerceptronClassifier.
There is no way to track or know what parameters were best during a
crossvalidation or which parameters were used for submodels.
{code:java}
{Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1',
name='featuresCol', doc='features column name'): 'features',
Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='labelCol',
doc='label column name'): 'fake_banknote',
Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1',
name='predictionCol', doc='prediction column name'): 'prediction',
Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1',
name='probabilityCol', doc='Column name for predicted class conditional
probabilities. Note: Not all models output well-calibrated probability
estimates! These probabilities should be treated as confidences, not precise
probabilities'): 'probability',
Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1',
name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column name'):
'rawPrediction'}{code}
GBTClassifier for example shows all parameters:
{code:java}
{Param(parent='GBTClassifier_a0e77b3430aa', name='cacheNodeIds', doc='If
false, the algorithm will pass trees to executors to match instances with
nodes. If true, the algorithm will cache node IDs for each instance. Caching
can speed up training of deeper trees.'): False,
Param(parent='GBTClassifier_a0e77b3430aa', name='checkpointInterval', doc='set
checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the
cache will get checkpointed every 10 iterations. Note: this setting will be
ignored if the checkpoint directory is not set in the SparkContext'): 10,
Param(parent='GBTClassifier_a0e77b3430aa', name='featureSubsetStrategy',
doc='The number of features to consider for splits at each tree node. Supported
options: auto, all, onethird, sqrt, log2, (0.0-1.0], [1-n].'): 'all',
Param(parent='GBTClassifier_a0e77b3430aa', name='featuresCol', doc='features
column name'): 'features', Param(parent='GBTClassifier_a0e77b3430aa',
name='labelCol', doc='label column name'): 'fake_banknote',
Param(parent='GBTClassifier_a0e77b3430aa', name='lossType', doc='Loss function
which GBT tries to minimize (case-insensitive). Supported options: logistic'):
'logistic', Param(parent='GBTClassifier_a0e77b3430aa', name='maxBins', doc='Max
number of bins for discretizing continuous features. Must be >=2 and >= number
of categories for any categorical feature.'): 8,
Param(parent='GBTClassifier_a0e77b3430aa', name='maxDepth', doc='Maximum depth
of the tree. (>= 0) E.g., depth 0 means 1 leaf node; depth 1 means 1 internal
node + 2 leaf nodes.'): 5, Param(parent='GBTClassifier_a0e77b3430aa',
name='maxIter', doc='maximum number of iterations (>= 0)'): 20,
Param(parent='GBTClassifier_a0e77b3430aa', name='maxMemoryInMB', doc='Maximum
memory in MB allocated to histogram aggregation.'): 256,
Param(parent='GBTClassifier_a0e77b3430aa', name='minInfoGain', doc='Minimum
information gain for a split to be considered at a tree node.'): 0.0,
Param(parent='GBTClassifier_a0e77b3430aa', name='minInstancesPerNode',
doc='Minimum number of instances each child must have after split. If a split
causes the left or right child to have fewer than minInstancesPerNode, the
split will be discarded as invalid. Should be >= 1.'): 1,
Param(parent='GBTClassifier_a0e77b3430aa', name='predictionCol',
doc='prediction column name'): 'prediction',
Param(parent='GBTClassifier_a0e77b3430aa', name='seed', doc='random seed'):
1234, Param(parent='GBTClassifier_a0e77b3430aa', name='stepSize', doc='Step
size (a.k.a. learning rate) in interval (0, 1] for shrinking the contribution
of each estimator.'): 0.1, Param(parent='GBTClassifier_a0e77b3430aa',
name='subsamplingRate', doc='Fraction of the training data used for learning
each decision tree, in range (0, 1].'): 1.0}{code}
Full example notebook here:
https://colab.research.google.com/drive/1lwSHioZKlLh96FhGkdYFe6FUuRfTcSxH
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]