Hi Fazlan, Please see my comments inline in blue.
> > No I am not planning to build the model from scratch. Once the serialized > spark model is built, we can export it to PMML format. In other words, we > are using the serialized model in order to build the PMML model. > That's great. If I have not mistaken what you are suggesting is let the user go through > the normal workflow of model building and once it is done, give an option > to the user to export it to PMML format(also for the models that have been > already built)? > Yes, this is exactly what I meant. > @Vidura I will check on the run-time support, if that is possible that > would be great. > If it's supported, it'll be great. If not we can still do it based on the model type but I think it'll be a bit messy as the code wouldn't be as generic. Thanks and Regards, Vidura > > On Mon, Oct 12, 2015 at 12:10 PM, Vidura Gamini Abhaya <[email protected]> > wrote: > >> Hi Fazlan, >> >> Are you planning to build a PMML model from the scratch (i.e going >> through the entire flow to build an ML model) or is this to be used for >> exporting a PMML out of an already built model? >> >> If it's the former, +1 to what CD mentioned on not asking user to go >> through the entire ML workflow for PMML. My preference is also for >> saving/exporting a model in PMML to be an option for the user, once a model >> is built and for models that have already been built. >> >> @Fazlan - Can we find out whether the PMML export is possible at runtime >> through a method or through the inheritance hierarchy? If so, we could only >> make the export option visible on a UI, only for supported models. >> >> Thanks and Regards, >> >> Vidura >> >> >> >> On 12 October 2015 at 11:33, CD Athuraliya <[email protected]> wrote: >> >>> Hi, >>> >>> I feel that asking user to go through the complete ML workflow for PMML >>> is too demanding. Computationally this conversion should be less expensive >>> compared to model training in real world use cases (since it's a mapping of >>> model parameters from Java objects to XML AFAIK). And model training should >>> be independent from the model format. Instead can't we support this >>> conversion on demand? Or save in both formats for now? Once Spark starts >>> supporting PMML for all algorithms we can go for Method 1 if it looks >>> consistent through out our ML life cycle. >>> >>> Thanks >>> >>> On Mon, Oct 12, 2015 at 11:09 AM, Fazlan Nazeem <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> I am working on redmine[1] regarding PMML support for Machine Learner. >>>> Please provide your opinion on this design. >>>> [1]https://redmine.wso2.com/issues/4303 >>>> >>>> *Overview* >>>> >>>> Spark 1.5.1(lastest version) supports PMML model export for some of the >>>> available models in Spark through MLlib. >>>> >>>> The table below outlines the MLlib models that can be exported to PMML >>>> and their equivalent PMML model. >>>> >>>> >>>> >>>> MLlib model >>>> >>>> PMML model >>>> >>>> KMeansModel >>>> >>>> ClusteringModel >>>> >>>> LinearRegressionModel >>>> >>>> RegressionModel (functionName="regression") >>>> >>>> RidgeRegressionModel >>>> >>>> RegressionModel (functionName="regression") >>>> >>>> LassoModel >>>> >>>> RegressionModel (functionName="regression") >>>> >>>> SVMModel >>>> >>>> RegressionModel (functionName="classification" >>>> normalizationMethod="none") >>>> >>>> Binary LogisticRegressionModel >>>> >>>> RegressionModel (functionName="classification" >>>> normalizationMethod="logit") >>>> >>>> Not all models available in MLlib can be exported to PMML as of now. >>>> Goal >>>> >>>> 1. >>>> >>>> We need to save models generated by WSO2 ML(PMML supported models) >>>> in PMML format, so that those could be reused from PMML supported tools. >>>> >>>> How To >>>> >>>> if “clusters” is the trained model, we can do the following with the >>>> PMML support. >>>> >>>> // Export the model to a String in PMML format >>>> clusters.toPMML >>>> >>>> // Export the model to a local file in PMML format >>>> clusters.toPMML("/tmp/kmeans.xml") >>>> >>>> // Export the model to a directory on a distributed file system in PMML >>>> format >>>> clusters.toPMML(sc,"/tmp/kmeans") >>>> >>>> // Export the model to the OutputStream in PMML format >>>> clusters.toPMML(System.out) >>>> >>>> For unsupported models, either you will not find a .toPMML method or an >>>> IllegalArgumentException will be thrown. >>>> Design >>>> >>>> In the following diagram models highlighted in green can be exported to >>>> PMML, but not the models highlighted in red. The diagram illustrates >>>> algorithms supported by WSO2 Machine Learner. >>>> >>>> [image: Inline image 2] >>>> >>>> >>>> Method 1 >>>> >>>> By default save the models in PMML if PMML export is supported, using >>>> one of these supported options. >>>> >>>> 1. Export the model to a String in PMML format >>>> 2. Export the model to a local file in PMML format >>>> 3. Export the model to a directory on a distributed file system in >>>> PMML format >>>> 4 . Export the model to the OutputStream in PMML format >>>> >>>> Classes need to be modified (apart from UI) >>>> >>>> - >>>> >>>> SupervisedSparkModelBuilder >>>> - >>>> >>>> UnsupervisedSparkModelBuilder >>>> >>>> >>>> e.g >>>> >>>> [image: Inline image 1] >>>> >>>> As of now the serialized models are saved in “models” folder. The PMML >>>> models can also be saved in the same directory with a PMML suffix. >>>> >>>> optional: >>>> >>>> After the model is generated let the user export the PMML model to a >>>> chosen location through the UI. >>>> >>>> Method 2 >>>> >>>> Add a *new REST API* to build models with PMML >>>> >>>> public Response buildPMMLModel(@PathParam("modelId") long modelId) >>>> >>>> in the backend we could add an additional argument to "buildXModel" >>>> methods to decide whether to save the PMML model or not. >>>> >>>> UI modifications also needed (An option for the user to choose whether >>>> to build the PMML and to choose the path to save it) >>>> >>>> Identified classes need to be modified (apart from UI) >>>> >>>> - >>>> >>>> SupervisedSparkModelBuilder >>>> - >>>> >>>> UnsupervisedSparkModelBuilder >>>> - >>>> >>>> ModelApiV10 >>>> >>>> >>>> >>>> *Conclusion* >>>> >>>> Currently we have decided to go with "Method 2" because of the >>>> following reasons. >>>> >>>> - Not all models have PMML support in Spark. >>>> - If we are to use anything apart from Spark MLlib, such as H2O, we >>>> will be depending on PMML support from H2O. >>>> - With Method 1 we might be generating PMML models when users are >>>> not in need of it (useless computation). >>>> >>>> Please let me know if there is a better way to improve the design. >>>> >>>> -- >>>> Thanks & Regards, >>>> >>>> Fazlan Nazeem >>>> >>>> *Software Engineer* >>>> >>>> *WSO2 Inc* >>>> Mobile : +94772338839 >>>> <%2B94%20%280%29%20773%20451194> >>>> [email protected] >>>> >>> >>> >>> >>> -- >>> *CD Athuraliya* >>> Software Engineer >>> WSO2, Inc. >>> lean . enterprise . middleware >>> Mobile: +94 716288847 <94716288847> >>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>> <https://twitter.com/cdathuraliya> | Blog >>> <https://cdathuraliya.wordpress.com/> >>> >> >> >> >> -- >> Vidura Gamini Abhaya, Ph.D. >> Director of Engineering >> M:+94 77 034 7754 >> E: [email protected] >> >> WSO2 Inc. (http://wso2.com) >> lean.enterprise.middleware >> > > > > -- > Thanks & Regards, > > Fazlan Nazeem > > *Software Engineer* > > *WSO2 Inc* > Mobile : +94772338839 > <%2B94%20%280%29%20773%20451194> > [email protected] > -- Vidura Gamini Abhaya, Ph.D. Director of Engineering M:+94 77 034 7754 E: [email protected] WSO2 Inc. (http://wso2.com) lean.enterprise.middleware
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
