[jira] [Created] (SPARK-23414) Plotting using matplotlib in MLlib pyspark

Waleed Esmail (JIRA) Tue, 13 Feb 2018 08:57:05 -0800

Waleed Esmail created SPARK-23414:
-------------------------------------

             Summary: Plotting using matplotlib in MLlib pyspark 
                 Key: SPARK-23414
                 URL: https://issues.apache.org/jira/browse/SPARK-23414
             Project: Spark
          Issue Type: Question
          Components: MLlib
    Affects Versions: 2.2.1
            Reporter: Waleed Esmail



Dear MLlib experts,

I just want to plot a fancy confusion matrix (true values vs predicted values) 
like the one produced by seaborn module in python, so I did the following:
{code:java}
labelIndexer = StringIndexer(inputCol="label", 
outputCol="indexedLabel").fit(output)
# Automatically identify categorical features, and index them.
# We specify maxCategories so features with > 4 distinct values are treated as 
continuous.
featureIndexer = VectorIndexer(inputCol="features", 
outputCol="indexedFeatures").fit(output)

# Split the data into training and test sets (30% held out for testing)
(trainingData, testData) = output.randomSplit([0.7, 0.3])


dt = DecisionTreeClassifier(labelCol="indexedLabel", 
featuresCol="indexedFeatures", maxDepth=15)

# Chain indexers and tree in a Pipeline
pipeline = Pipeline(stages=[labelIndexer, featureIndexer, dt])
# Train model.  This also runs the indexers.
model = pipeline.fit(trainingData)

# Make predictions.
predictions = model.transform(testData)
predictionAndLabels = predictions.select("prediction", "indexedLabel")

y_predicted = np.array(predictions.select("prediction").collect())
y_test = np.array(predictions.select("indexedLabel").collect())



from sklearn.metrics import confusion_matrix
import matplotlib.ticker as ticker

figcm, ax = plt.subplots()
cm = confusion_matrix(y_test, y_predicted)
# for normalization
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
sns.heatmap(cm, square=True, annot=True, cbar=False)
plt.xlabel('predication')
plt.ylabel('true value')
{code}
is this the right way to do it?!. please note that I am new to Spark and MLlib

 

thank you in advance,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-23414) Plotting using matplotlib in MLlib pyspark

Reply via email to