[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194315#comment-14194315
 ] 

Sean Owen commented on SPARK-1406:
----------------------------------

I put some comments on the PR. Thanks for starting on this. I think PMML 
interoperability is indeed helpful. 

So, one big issue here is that MLlib does not at the moment have any notion of 
a schema. PMML does, and this is vital to actually using the model elsewhere. 
You have to document what the variables are so they can be matched up with the 
same variables in another tool. So it's not possible now to do anything but 
make a model with "field_1", "field_2", ... This calls into question whether 
PMML can be meaningfully exported at this point from MLlib? Maybe it will have 
to wait until other PRs go in that start to add schema.

I also thought it would be a little better to separate the representation of a 
model, from utility methods to write the model to things like files. The latter 
can be at least separated out of the type hierarchy. I'm also wondering how 
much value it adds to design for non-PMML export at this stage.

(Finally I have some code lying around here that will translate the MLlib 
logistic regression model to PMML. I can put that in the pot at a suitable 
time.)

> PMML model evaluation support via MLib
> --------------------------------------
>
>                 Key: SPARK-1406
>                 URL: https://issues.apache.org/jira/browse/SPARK-1406
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Thomas Darimont
>         Attachments: SPARK-1406.pdf, kmeans.xml
>
>
> It would be useful if spark would provide support the evaluation of PMML 
> models (http://www.dmg.org/v4-2/GeneralStructure.html).
> This would allow to use analytical models that were created with a 
> statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
> would perform the actual model evaluation for a given input tuple. The PMML 
> model would then just contain the "parameterization" of an analytical model.
> Other projects like JPMML-Evaluator do a similar thing.
> https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to