[
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962133#comment-13962133
]
Sean Owen commented on SPARK-1406:
----------------------------------
PMML is the de facto serialization, so certainly the one to consider
leveraging. It's just a serialization, so it's not by itself going to help with
feature transformation.
Given data and PMML, it's fairly easy to use things like JPMML to do
evaluation. You could write some thin wrapper code in MLlib to facilitate that,
but it may not give a lot of marginal benefit.
Import/export is a bit different. Again JPMML will do all the mechanisms of
serializing an object model, so that need not be written.
I think export is more important than import, mostly because I think of MLlib
as a model builder, and therefore a producer rather than consumer of models.
Export is also easier since you just need to write the glue code to translate
some MLlib object into a JPMML representation, and only need to worry about
dealing with the subset of PMML that covers whatever the MLlib output describes.
Import is harder for the same reason -- you're not going to want to or be able
to support everything PMML can describe, so it's already a question of trying
to map the vocab as best you can to whatever MLlib supports. It's also less
important, IMHO, since MLlib's value is more in making the model than doing
something with it right now.
I would suggest the import/export stuff be kept close, but separate, to the
other MLlib code. Not a different module, just cleanly separated from the
abstract representation.
I think there's a whole project's worth of stuff one could do around consuming,
managing, serving models!
So to summarize: I'd suggest scoping this to start as "wire up all *Model files
to JPMML equivalents, as an 'export' package" or something.
> PMML model evaluation support via MLib
> --------------------------------------
>
> Key: SPARK-1406
> URL: https://issues.apache.org/jira/browse/SPARK-1406
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Thomas Darimont
>
> It would be useful if spark would provide support the evaluation of PMML
> models (http://www.dmg.org/v4-2/GeneralStructure.html).
> This would allow to use analytical models that were created with a
> statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which
> would perform the actual model evaluation for a given input tuple. The PMML
> model would then just contain the "parameterization" of an analytical model.
> Other projects like JPMML-Evaluator do a similar thing.
> https://github.com/jpmml/jpmml/tree/master/pmml-evaluator
--
This message was sent by Atlassian JIRA
(v6.2#6252)