[ 
https://issues.apache.org/jira/browse/SPARK-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547056#comment-14547056
 ] 

Vincenzo Selvaggio commented on SPARK-7540:
-------------------------------------------

All models supporting the pmml export have been manually tested using the 
following approach: 
- a scala file generate a model, produce the pmml xml file and predict some 
values
- java code read the pmml xml file and run prediction on the same values giving 
same results

Best way to run the tests is to follow the documentation at 
https://github.com/selvinsource/spark-pmml-exporter-validator, however reported 
below direct links to scala code and generated xml for quick reference.

K-Means Clustering
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/spark_shell_exporter/kmeans_iris.scala
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/exported_pmml_models/kmeans.xml

Linear Regression
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/spark_shell_exporter/linearregression_winequalityred.scala
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/exported_pmml_models/linearregression.xml

Ridge Regression
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/spark_shell_exporter/ridgeregression_winequalityred.scala
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/exported_pmml_models/ridgeregression.xml

Lasso Regression
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/spark_shell_exporter/lassoregression_winequalityred.scala
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/exported_pmml_models/lassoregression.xml

Linear SVM
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/spark_shell_exporter/linearsvm_breastcancerwisconsin.scala
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/exported_pmml_models/linearsvm.xml

Logistic Regression
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/spark_shell_exporter/logisticregression_breastcancerwisconsin.scala
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/exported_pmml_models/logisticregression.xml


> PMML correctness check
> ----------------------
>
>                 Key: SPARK-7540
>                 URL: https://issues.apache.org/jira/browse/SPARK-7540
>             Project: Spark
>          Issue Type: Sub-task
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>            Assignee: Shuo Xiang
>
> Check correctness of PMML export for MLlib models by using PMML evaluator to 
> load and run the models.  This unfortunately needs to be done externally (not 
> in spark-perf) because of licensing.  A record of tests run and the results 
> can be posted in this JIRA, as well as a link to the repo hosting the testing 
> code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to