Github user chobeat commented on the pull request:

    https://github.com/apache/flink/pull/1186#issuecomment-181578375
  
    Hi @chiwanpark,
    
    > What is main purpose to support PMML? Is this feature for only model 
portability in FlinkML?
    
    I've used PMML extensively in a previous project and saw many application 
cases other than my own. PMML export is necessary for  external portability: 
you may need to create a model in Flink and use it on local data using a data 
mining tool for example, or you could deploy it in a production pipeline 
developed with a totally different technological stack. 
    PMML import is optional though: you can use JPMML (the reference 
implementation of PMML) to read a PMML file and perform the evaluation of the 
model locally to the node. Import from PMML to the native implementation of 
FlinkML may be a plus in terms of usability and probably performance but it's 
not really a blocking issue for a developer.
    
    > If not, we have to support other systems such as R or Spark MLlib.
    
    Support for R may be interesting by itself but I can't understand what do 
you mean. MLlib does support PMML export (even if somewhat bugged for a few 
models like Naive Bayes) so it is already possible to move models from MLlib to 
Flink.
    
    >What about FlinkML only format? I think that support for distributed 
system in PMML is poor. XML-based format is hard to parallelize.
    
    This could be interesting to guarantee the consistency of the models and to 
tune it to our needs. The complexity of PMML is due to the need of generality 
and consistency but it's often an overkill to describe simple models. Also it 
has only partial support for many models that we may want to implement: i.e. 
any of the online learning algorithms implemented in SAMOA or other online 
learning frameworks. I know we still miss a few pieces before reaching that 
point, but still...
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to