[ https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194315#comment-14194315 ]
Sean Owen commented on SPARK-1406: ---------------------------------- I put some comments on the PR. Thanks for starting on this. I think PMML interoperability is indeed helpful. So, one big issue here is that MLlib does not at the moment have any notion of a schema. PMML does, and this is vital to actually using the model elsewhere. You have to document what the variables are so they can be matched up with the same variables in another tool. So it's not possible now to do anything but make a model with "field_1", "field_2", ... This calls into question whether PMML can be meaningfully exported at this point from MLlib? Maybe it will have to wait until other PRs go in that start to add schema. I also thought it would be a little better to separate the representation of a model, from utility methods to write the model to things like files. The latter can be at least separated out of the type hierarchy. I'm also wondering how much value it adds to design for non-PMML export at this stage. (Finally I have some code lying around here that will translate the MLlib logistic regression model to PMML. I can put that in the pot at a suitable time.) > PMML model evaluation support via MLib > -------------------------------------- > > Key: SPARK-1406 > URL: https://issues.apache.org/jira/browse/SPARK-1406 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Thomas Darimont > Attachments: SPARK-1406.pdf, kmeans.xml > > > It would be useful if spark would provide support the evaluation of PMML > models (http://www.dmg.org/v4-2/GeneralStructure.html). > This would allow to use analytical models that were created with a > statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which > would perform the actual model evaluation for a given input tuple. The PMML > model would then just contain the "parameterization" of an analytical model. > Other projects like JPMML-Evaluator do a similar thing. > https://github.com/jpmml/jpmml/tree/master/pmml-evaluator -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org