[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962133#comment-13962133
 ] 

Sean Owen commented on SPARK-1406:
----------------------------------

PMML is the de facto serialization, so certainly the one to consider 
leveraging. It's just a serialization, so it's not by itself going to help with 
feature transformation.

Given data and PMML, it's fairly easy to use things like JPMML to do 
evaluation. You could write some thin wrapper code in MLlib to facilitate that, 
but it may not give a lot of marginal benefit.

Import/export is a bit different. Again JPMML will do all the mechanisms of 
serializing an object model, so that need not be written.

I think export is more important than import, mostly because I think of MLlib 
as a model builder, and therefore a producer rather than consumer of models. 
Export is also easier since you just need to write the glue code to translate 
some MLlib object into a JPMML representation, and only need to worry about 
dealing with the subset of PMML that covers whatever the MLlib output describes.

Import is harder for the same reason -- you're not going to want to or be able 
to support everything PMML can describe, so it's already a question of trying 
to map the vocab as best you can to whatever MLlib supports. It's also less 
important, IMHO, since MLlib's value is more in making the model than doing 
something with it right now.

I would suggest the import/export stuff be kept close, but separate, to the 
other MLlib code. Not a different module, just cleanly separated from the 
abstract representation.

I think there's a whole project's worth of stuff one could do around consuming, 
managing, serving models!

So to summarize: I'd suggest scoping this to start as "wire up all *Model files 
to JPMML equivalents, as an 'export' package" or something.

> PMML model evaluation support via MLib
> --------------------------------------
>
>                 Key: SPARK-1406
>                 URL: https://issues.apache.org/jira/browse/SPARK-1406
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Thomas Darimont
>
> It would be useful if spark would provide support the evaluation of PMML 
> models (http://www.dmg.org/v4-2/GeneralStructure.html).
> This would allow to use analytical models that were created with a 
> statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
> would perform the actual model evaluation for a given input tuple. The PMML 
> model would then just contain the "parameterization" of an analytical model.
> Other projects like JPMML-Evaluator do a similar thing.
> https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to