Github user chobeat commented on the pull request: https://github.com/apache/flink/pull/1186#issuecomment-181578375 Hi @chiwanpark, > What is main purpose to support PMML? Is this feature for only model portability in FlinkML? I've used PMML extensively in a previous project and saw many application cases other than my own. PMML export is necessary for external portability: you may need to create a model in Flink and use it on local data using a data mining tool for example, or you could deploy it in a production pipeline developed with a totally different technological stack. PMML import is optional though: you can use JPMML (the reference implementation of PMML) to read a PMML file and perform the evaluation of the model locally to the node. Import from PMML to the native implementation of FlinkML may be a plus in terms of usability and probably performance but it's not really a blocking issue for a developer. > If not, we have to support other systems such as R or Spark MLlib. Support for R may be interesting by itself but I can't understand what do you mean. MLlib does support PMML export (even if somewhat bugged for a few models like Naive Bayes) so it is already possible to move models from MLlib to Flink. >What about FlinkML only format? I think that support for distributed system in PMML is poor. XML-based format is hard to parallelize. This could be interesting to guarantee the consistency of the models and to tune it to our needs. The complexity of PMML is due to the need of generality and consistency but it's often an overkill to describe simple models. Also it has only partial support for many models that we may want to implement: i.e. any of the online learning algorithms implemented in SAMOA or other online learning frameworks. I know we still miss a few pieces before reaching that point, but still...
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---