Hi Fazlan, Are you planning to build a PMML model from the scratch (i.e going through the entire flow to build an ML model) or is this to be used for exporting a PMML out of an already built model?
If it's the former, +1 to what CD mentioned on not asking user to go through the entire ML workflow for PMML. My preference is also for saving/exporting a model in PMML to be an option for the user, once a model is built and for models that have already been built. @Fazlan - Can we find out whether the PMML export is possible at runtime through a method or through the inheritance hierarchy? If so, we could only make the export option visible on a UI, only for supported models. Thanks and Regards, Vidura On 12 October 2015 at 11:33, CD Athuraliya <[email protected]> wrote: > Hi, > > I feel that asking user to go through the complete ML workflow for PMML is > too demanding. Computationally this conversion should be less expensive > compared to model training in real world use cases (since it's a mapping of > model parameters from Java objects to XML AFAIK). And model training should > be independent from the model format. Instead can't we support this > conversion on demand? Or save in both formats for now? Once Spark starts > supporting PMML for all algorithms we can go for Method 1 if it looks > consistent through out our ML life cycle. > > Thanks > > On Mon, Oct 12, 2015 at 11:09 AM, Fazlan Nazeem <[email protected]> wrote: > >> Hi, >> >> I am working on redmine[1] regarding PMML support for Machine Learner. >> Please provide your opinion on this design. >> [1]https://redmine.wso2.com/issues/4303 >> >> *Overview* >> >> Spark 1.5.1(lastest version) supports PMML model export for some of the >> available models in Spark through MLlib. >> >> The table below outlines the MLlib models that can be exported to PMML >> and their equivalent PMML model. >> >> >> >> MLlib model >> >> PMML model >> >> KMeansModel >> >> ClusteringModel >> >> LinearRegressionModel >> >> RegressionModel (functionName="regression") >> >> RidgeRegressionModel >> >> RegressionModel (functionName="regression") >> >> LassoModel >> >> RegressionModel (functionName="regression") >> >> SVMModel >> >> RegressionModel (functionName="classification" normalizationMethod="none") >> >> Binary LogisticRegressionModel >> >> RegressionModel (functionName="classification" >> normalizationMethod="logit") >> >> Not all models available in MLlib can be exported to PMML as of now. >> Goal >> >> 1. >> >> We need to save models generated by WSO2 ML(PMML supported models) in >> PMML format, so that those could be reused from PMML supported tools. >> >> How To >> >> if “clusters” is the trained model, we can do the following with the PMML >> support. >> >> // Export the model to a String in PMML format >> clusters.toPMML >> >> // Export the model to a local file in PMML format >> clusters.toPMML("/tmp/kmeans.xml") >> >> // Export the model to a directory on a distributed file system in PMML >> format >> clusters.toPMML(sc,"/tmp/kmeans") >> >> // Export the model to the OutputStream in PMML format >> clusters.toPMML(System.out) >> >> For unsupported models, either you will not find a .toPMML method or an >> IllegalArgumentException will be thrown. >> Design >> >> In the following diagram models highlighted in green can be exported to >> PMML, but not the models highlighted in red. The diagram illustrates >> algorithms supported by WSO2 Machine Learner. >> >> [image: Inline image 2] >> >> >> Method 1 >> >> By default save the models in PMML if PMML export is supported, using one >> of these supported options. >> >> 1. Export the model to a String in PMML format >> 2. Export the model to a local file in PMML format >> 3. Export the model to a directory on a distributed file system in PMML >> format >> 4 . Export the model to the OutputStream in PMML format >> >> Classes need to be modified (apart from UI) >> >> - >> >> SupervisedSparkModelBuilder >> - >> >> UnsupervisedSparkModelBuilder >> >> >> e.g >> >> [image: Inline image 1] >> >> As of now the serialized models are saved in “models” folder. The PMML >> models can also be saved in the same directory with a PMML suffix. >> >> optional: >> >> After the model is generated let the user export the PMML model to a >> chosen location through the UI. >> >> Method 2 >> >> Add a *new REST API* to build models with PMML >> >> public Response buildPMMLModel(@PathParam("modelId") long modelId) >> >> in the backend we could add an additional argument to "buildXModel" >> methods to decide whether to save the PMML model or not. >> >> UI modifications also needed (An option for the user to choose whether to >> build the PMML and to choose the path to save it) >> >> Identified classes need to be modified (apart from UI) >> >> - >> >> SupervisedSparkModelBuilder >> - >> >> UnsupervisedSparkModelBuilder >> - >> >> ModelApiV10 >> >> >> >> *Conclusion* >> >> Currently we have decided to go with "Method 2" because of the following >> reasons. >> >> - Not all models have PMML support in Spark. >> - If we are to use anything apart from Spark MLlib, such as H2O, we >> will be depending on PMML support from H2O. >> - With Method 1 we might be generating PMML models when users are not >> in need of it (useless computation). >> >> Please let me know if there is a better way to improve the design. >> >> -- >> Thanks & Regards, >> >> Fazlan Nazeem >> >> *Software Engineer* >> >> *WSO2 Inc* >> Mobile : +94772338839 >> <%2B94%20%280%29%20773%20451194> >> [email protected] >> > > > > -- > *CD Athuraliya* > Software Engineer > WSO2, Inc. > lean . enterprise . middleware > Mobile: +94 716288847 <94716288847> > LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter > <https://twitter.com/cdathuraliya> | Blog > <https://cdathuraliya.wordpress.com/> > -- Vidura Gamini Abhaya, Ph.D. Director of Engineering M:+94 77 034 7754 E: [email protected] WSO2 Inc. (http://wso2.com) lean.enterprise.middleware
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
