[
https://issues.apache.org/jira/browse/SPARK-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547056#comment-14547056
] Vincenzo Selvaggio commented on SPARK-7540: ------------------------------------------- All models supporting the pmml export have been manually tested using the following approach: - a scala file generate a model, produce the pmml xml file and predict some values - java code read the pmml xml file and run prediction on the same values giving same results Best way to run the tests is to follow the documentation at https://github.com/selvinsource/spark-pmml-exporter-validator, however reported below direct links to scala code and generated xml for quick reference. K-Means Clustering https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/spark_shell_exporter/kmeans_iris.scala https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/exported_pmml_models/kmeans.xml Linear Regression https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/spark_shell_exporter/linearregression_winequalityred.scala https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/exported_pmml_models/linearregression.xml Ridge Regression https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/spark_shell_exporter/ridgeregression_winequalityred.scala https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/exported_pmml_models/ridgeregression.xml Lasso Regression https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/spark_shell_exporter/lassoregression_winequalityred.scala https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/exported_pmml_models/lassoregression.xml Linear SVM https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/spark_shell_exporter/linearsvm_breastcancerwisconsin.scala https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/exported_pmml_models/linearsvm.xml Logistic Regression https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/spark_shell_exporter/logisticregression_breastcancerwisconsin.scala https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/exported_pmml_models/logisticregression.xml > PMML correctness check > ---------------------- > > Key: SPARK-7540 > URL: https://issues.apache.org/jira/browse/SPARK-7540 > Project: Spark > Issue Type: Sub-task > Components: MLlib > Reporter: Joseph K. Bradley > Assignee: Shuo Xiang > > Check correctness of PMML export for MLlib models by using PMML evaluator to > load and run the models. This unfortunately needs to be done externally (not > in spark-perf) because of licensing. A record of tests run and the results > can be posted in this JIRA, as well as a link to the repo hosting the testing > code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
