Sure Nirmal. Thanks CD for pointing it out!
On Mon, Oct 12, 2015 at 2:14 PM, Nirmal Fernando <[email protected]> wrote: > Excellent! Good catch! Fazlan please fix. exportAsPMMLModel may be? > > On Mon, Oct 12, 2015 at 2:10 PM, CD Athuraliya <[email protected]> > wrote: > >> To me buildPMMLModel(long modelId) sounds more like we are building (or >> training) the model. ExportPMML or something similar would sound more >> like the actual action IMO. >> >> On Mon, Oct 12, 2015 at 2:02 PM, Nirmal Fernando <[email protected]> wrote: >> >>> Hi CD/Vidura, >>> >>> On Mon, Oct 12, 2015 at 1:56 PM, CD Athuraliya <[email protected]> >>> wrote: >>> >>>> >>>> >>>> On Mon, Oct 12, 2015 at 12:36 PM, Vidura Gamini Abhaya <[email protected] >>>> > wrote: >>>> >>>>> Hi Fazlan, >>>>> >>>>> Please see my comments inline in blue. >>>>> >>>>>> >>>>>> No I am not planning to build the model from scratch. Once the >>>>>> serialized spark model is built, we can export it to PMML format. In >>>>>> other >>>>>> words, we are using the serialized model in order to build the PMML >>>>>> model. >>>>>> >>>>> >>>>> That's great. >>>>> >>>>> If I have not mistaken what you are suggesting is let the user go >>>>>> through the normal workflow of model building and once it is done, give >>>>>> an >>>>>> option to the user to export it to PMML format(also for the models that >>>>>> have been already built)? >>>>>> >>>>> >>>> Yes exactly! What we should not do IMO is asking the user to go through >>>> the whole workflow if he needs to export already created model in PMML. >>>> >>> >>> Can you please explain from where did you get this idea? If this idea is >>> there in Fazlan's content, we need to fix it. >>> >>> >>>>> Yes, this is exactly what I meant. >>>>> >>>>> >>>>>> @Vidura I will check on the run-time support, if that is possible >>>>>> that would be great. >>>>>> >>>>> >>>>> If it's supported, it'll be great. If not we can still do it based on >>>>> the model type but I think it'll be a bit messy as the code wouldn't be as >>>>> generic. >>>>> >>>>> >>>>> Thanks and Regards, >>>>> >>>>> Vidura >>>>> >>>>> >>>>> >>>>>> >>>>>> On Mon, Oct 12, 2015 at 12:10 PM, Vidura Gamini Abhaya < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Fazlan, >>>>>>> >>>>>>> Are you planning to build a PMML model from the scratch (i.e going >>>>>>> through the entire flow to build an ML model) or is this to be used for >>>>>>> exporting a PMML out of an already built model? >>>>>>> >>>>>>> If it's the former, +1 to what CD mentioned on not asking user to go >>>>>>> through the entire ML workflow for PMML. My preference is also for >>>>>>> saving/exporting a model in PMML to be an option for the user, once a >>>>>>> model >>>>>>> is built and for models that have already been built. >>>>>>> >>>>>>> @Fazlan - Can we find out whether the PMML export is possible at >>>>>>> runtime through a method or through the inheritance hierarchy? If so, we >>>>>>> could only make the export option visible on a UI, only for supported >>>>>>> models. >>>>>>> >>>>>>> Thanks and Regards, >>>>>>> >>>>>>> Vidura >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 12 October 2015 at 11:33, CD Athuraliya <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I feel that asking user to go through the complete ML workflow for >>>>>>>> PMML is too demanding. Computationally this conversion should be less >>>>>>>> expensive compared to model training in real world use cases (since >>>>>>>> it's a >>>>>>>> mapping of model parameters from Java objects to XML AFAIK). And model >>>>>>>> training should be independent from the model format. Instead can't we >>>>>>>> support this conversion on demand? Or save in both formats for now? >>>>>>>> Once >>>>>>>> Spark starts supporting PMML for all algorithms we can go for Method 1 >>>>>>>> if >>>>>>>> it looks consistent through out our ML life cycle. >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> On Mon, Oct 12, 2015 at 11:09 AM, Fazlan Nazeem <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am working on redmine[1] regarding PMML support for Machine >>>>>>>>> Learner. Please provide your opinion on this design. >>>>>>>>> [1]https://redmine.wso2.com/issues/4303 >>>>>>>>> >>>>>>>>> *Overview* >>>>>>>>> >>>>>>>>> Spark 1.5.1(lastest version) supports PMML model export for some >>>>>>>>> of the available models in Spark through MLlib. >>>>>>>>> >>>>>>>>> The table below outlines the MLlib models that can be exported to >>>>>>>>> PMML and their equivalent PMML model. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> MLlib model >>>>>>>>> >>>>>>>>> PMML model >>>>>>>>> >>>>>>>>> KMeansModel >>>>>>>>> >>>>>>>>> ClusteringModel >>>>>>>>> >>>>>>>>> LinearRegressionModel >>>>>>>>> >>>>>>>>> RegressionModel (functionName="regression") >>>>>>>>> >>>>>>>>> RidgeRegressionModel >>>>>>>>> >>>>>>>>> RegressionModel (functionName="regression") >>>>>>>>> >>>>>>>>> LassoModel >>>>>>>>> >>>>>>>>> RegressionModel (functionName="regression") >>>>>>>>> >>>>>>>>> SVMModel >>>>>>>>> >>>>>>>>> RegressionModel (functionName="classification" >>>>>>>>> normalizationMethod="none") >>>>>>>>> >>>>>>>>> Binary LogisticRegressionModel >>>>>>>>> >>>>>>>>> RegressionModel (functionName="classification" >>>>>>>>> normalizationMethod="logit") >>>>>>>>> >>>>>>>>> Not all models available in MLlib can be exported to PMML as of >>>>>>>>> now. >>>>>>>>> Goal >>>>>>>>> >>>>>>>>> 1. >>>>>>>>> >>>>>>>>> We need to save models generated by WSO2 ML(PMML supported >>>>>>>>> models) in PMML format, so that those could be reused from PMML >>>>>>>>> supported >>>>>>>>> tools. >>>>>>>>> >>>>>>>>> How To >>>>>>>>> >>>>>>>>> if “clusters” is the trained model, we can do the following with >>>>>>>>> the PMML support. >>>>>>>>> >>>>>>>>> // Export the model to a String in PMML format >>>>>>>>> clusters.toPMML >>>>>>>>> >>>>>>>>> // Export the model to a local file in PMML format >>>>>>>>> clusters.toPMML("/tmp/kmeans.xml") >>>>>>>>> >>>>>>>>> // Export the model to a directory on a distributed file system in >>>>>>>>> PMML format >>>>>>>>> clusters.toPMML(sc,"/tmp/kmeans") >>>>>>>>> >>>>>>>>> // Export the model to the OutputStream in PMML format >>>>>>>>> clusters.toPMML(System.out) >>>>>>>>> >>>>>>>>> For unsupported models, either you will not find a .toPMML method >>>>>>>>> or an IllegalArgumentException will be thrown. >>>>>>>>> Design >>>>>>>>> >>>>>>>>> In the following diagram models highlighted in green can be >>>>>>>>> exported to PMML, but not the models highlighted in red. The diagram >>>>>>>>> illustrates algorithms supported by WSO2 Machine Learner. >>>>>>>>> >>>>>>>>> [image: Inline image 2] >>>>>>>>> >>>>>>>>> >>>>>>>>> Method 1 >>>>>>>>> >>>>>>>>> By default save the models in PMML if PMML export is supported, >>>>>>>>> using one of these supported options. >>>>>>>>> >>>>>>>>> 1. Export the model to a String in PMML format >>>>>>>>> 2. Export the model to a local file in PMML format >>>>>>>>> 3. Export the model to a directory on a distributed file system >>>>>>>>> in PMML format >>>>>>>>> 4 . Export the model to the OutputStream in PMML format >>>>>>>>> >>>>>>>>> Classes need to be modified (apart from UI) >>>>>>>>> >>>>>>>>> - >>>>>>>>> >>>>>>>>> SupervisedSparkModelBuilder >>>>>>>>> - >>>>>>>>> >>>>>>>>> UnsupervisedSparkModelBuilder >>>>>>>>> >>>>>>>>> >>>>>>>>> e.g >>>>>>>>> >>>>>>>>> [image: Inline image 1] >>>>>>>>> >>>>>>>>> As of now the serialized models are saved in “models” folder. The >>>>>>>>> PMML models can also be saved in the same directory with a PMML >>>>>>>>> suffix. >>>>>>>>> >>>>>>>>> optional: >>>>>>>>> >>>>>>>>> After the model is generated let the user export the PMML model to >>>>>>>>> a chosen location through the UI. >>>>>>>>> >>>>>>>>> Method 2 >>>>>>>>> >>>>>>>>> Add a *new REST API* to build models with PMML >>>>>>>>> >>>>>>>>> public Response buildPMMLModel(@PathParam("modelId") long modelId) >>>>>>>>> >>>>>>>>> in the backend we could add an additional argument to >>>>>>>>> "buildXModel" methods to decide whether to save the PMML model or not. >>>>>>>>> >>>>>>>>> UI modifications also needed (An option for the user to choose >>>>>>>>> whether to build the PMML and to choose the path to save it) >>>>>>>>> >>>>>>>>> Identified classes need to be modified (apart from UI) >>>>>>>>> >>>>>>>>> - >>>>>>>>> >>>>>>>>> SupervisedSparkModelBuilder >>>>>>>>> - >>>>>>>>> >>>>>>>>> UnsupervisedSparkModelBuilder >>>>>>>>> - >>>>>>>>> >>>>>>>>> ModelApiV10 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> *Conclusion* >>>>>>>>> >>>>>>>>> Currently we have decided to go with "Method 2" because of the >>>>>>>>> following reasons. >>>>>>>>> >>>>>>>>> - Not all models have PMML support in Spark. >>>>>>>>> - If we are to use anything apart from Spark MLlib, such as >>>>>>>>> H2O, we will be depending on PMML support from H2O. >>>>>>>>> - With Method 1 we might be generating PMML models when users >>>>>>>>> are not in need of it (useless computation). >>>>>>>>> >>>>>>>>> Please let me know if there is a better way to improve the design. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Thanks & Regards, >>>>>>>>> >>>>>>>>> Fazlan Nazeem >>>>>>>>> >>>>>>>>> *Software Engineer* >>>>>>>>> >>>>>>>>> *WSO2 Inc* >>>>>>>>> Mobile : +94772338839 >>>>>>>>> <%2B94%20%280%29%20773%20451194> >>>>>>>>> [email protected] >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *CD Athuraliya* >>>>>>>> Software Engineer >>>>>>>> WSO2, Inc. >>>>>>>> lean . enterprise . middleware >>>>>>>> Mobile: +94 716288847 <94716288847> >>>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>>>>> <https://twitter.com/cdathuraliya> | Blog >>>>>>>> <https://cdathuraliya.wordpress.com/> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Vidura Gamini Abhaya, Ph.D. >>>>>>> Director of Engineering >>>>>>> M:+94 77 034 7754 >>>>>>> E: [email protected] >>>>>>> >>>>>>> WSO2 Inc. (http://wso2.com) >>>>>>> lean.enterprise.middleware >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks & Regards, >>>>>> >>>>>> Fazlan Nazeem >>>>>> >>>>>> *Software Engineer* >>>>>> >>>>>> *WSO2 Inc* >>>>>> Mobile : +94772338839 >>>>>> <%2B94%20%280%29%20773%20451194> >>>>>> [email protected] >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Vidura Gamini Abhaya, Ph.D. >>>>> Director of Engineering >>>>> M:+94 77 034 7754 >>>>> E: [email protected] >>>>> >>>>> WSO2 Inc. (http://wso2.com) >>>>> lean.enterprise.middleware >>>>> >>>> >>>> >>>> >>>> -- >>>> *CD Athuraliya* >>>> Software Engineer >>>> WSO2, Inc. >>>> lean . enterprise . middleware >>>> Mobile: +94 716288847 <94716288847> >>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>> <https://twitter.com/cdathuraliya> | Blog >>>> <https://cdathuraliya.wordpress.com/> >>>> >>> >>> >>> >>> -- >>> >>> Thanks & regards, >>> Nirmal >>> >>> Team Lead - WSO2 Machine Learner >>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>> Mobile: +94715779733 >>> Blog: http://nirmalfdo.blogspot.com/ >>> >>> >>> >> >> >> -- >> *CD Athuraliya* >> Software Engineer >> WSO2, Inc. >> lean . enterprise . middleware >> Mobile: +94 716288847 <94716288847> >> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >> <https://twitter.com/cdathuraliya> | Blog >> <https://cdathuraliya.wordpress.com/> >> > > > > -- > > Thanks & regards, > Nirmal > > Team Lead - WSO2 Machine Learner > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > > -- Thanks & Regards, Fazlan Nazeem *Software Engineer* *WSO2 Inc* Mobile : +94772338839 <%2B94%20%280%29%20773%20451194> [email protected]
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
