Waleed Esmail created SPARK-23414:
-------------------------------------
Summary: Plotting using matplotlib in MLlib pyspark
Key: SPARK-23414
URL: https://issues.apache.org/jira/browse/SPARK-23414
Project: Spark
Issue Type: Question
Components: MLlib
Affects Versions: 2.2.1
Reporter: Waleed Esmail
Dear MLlib experts,
I just want to plot a fancy confusion matrix (true values vs predicted values)
like the one produced by seaborn module in python, so I did the following:
{code:java}
labelIndexer = StringIndexer(inputCol="label",
outputCol="indexedLabel").fit(output)
# Automatically identify categorical features, and index them.
# We specify maxCategories so features with > 4 distinct values are treated as
continuous.
featureIndexer = VectorIndexer(inputCol="features",
outputCol="indexedFeatures").fit(output)
# Split the data into training and test sets (30% held out for testing)
(trainingData, testData) = output.randomSplit([0.7, 0.3])
dt = DecisionTreeClassifier(labelCol="indexedLabel",
featuresCol="indexedFeatures", maxDepth=15)
# Chain indexers and tree in a Pipeline
pipeline = Pipeline(stages=[labelIndexer, featureIndexer, dt])
# Train model. This also runs the indexers.
model = pipeline.fit(trainingData)
# Make predictions.
predictions = model.transform(testData)
predictionAndLabels = predictions.select("prediction", "indexedLabel")
y_predicted = np.array(predictions.select("prediction").collect())
y_test = np.array(predictions.select("indexedLabel").collect())
from sklearn.metrics import confusion_matrix
import matplotlib.ticker as ticker
figcm, ax = plt.subplots()
cm = confusion_matrix(y_test, y_predicted)
# for normalization
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
sns.heatmap(cm, square=True, annot=True, cbar=False)
plt.xlabel('predication')
plt.ylabel('true value')
{code}
is this the right way to do it?!. please note that I am new to Spark and MLlib
thank you in advance,
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]