[
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964430#comment-13964430
]
Xiangrui Meng commented on SPARK-1406:
--------------------------------------
Thanks for sharing your thoughts! Feature transformation is part of the PMML
standard. It provides primitives to describe feature transformations. It is
very hard to describe feature transformation in practice, in most cases the
result is some ad-hoc and non-exchangeable code, which is hard to reuse. I'm
not a fan of XML, but as you mentioned, PMML is the de facto serialization.
I feel that supporting feature transformation in PMML is as important as -- if
not important than -- supporting exporting models to PMML. Especially, the
former provides an entry point to MLlib while the latter provides an exit. (I
admit that I'm a little selfish on this point.) Btw, Google Prediction API only
supports PMML's feature transformation:
https://developers.google.com/prediction/docs/pmml-schema
> PMML model evaluation support via MLib
> --------------------------------------
>
> Key: SPARK-1406
> URL: https://issues.apache.org/jira/browse/SPARK-1406
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Thomas Darimont
>
> It would be useful if spark would provide support the evaluation of PMML
> models (http://www.dmg.org/v4-2/GeneralStructure.html).
> This would allow to use analytical models that were created with a
> statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which
> would perform the actual model evaluation for a given input tuple. The PMML
> model would then just contain the "parameterization" of an analytical model.
> Other projects like JPMML-Evaluator do a similar thing.
> https://github.com/jpmml/jpmml/tree/master/pmml-evaluator
--
This message was sent by Atlassian JIRA
(v6.2#6252)