On Mon, Oct 12, 2015 at 12:36 PM, Vidura Gamini Abhaya <[email protected]> wrote:
> Hi Fazlan, > > Please see my comments inline in blue. > >> >> No I am not planning to build the model from scratch. Once the serialized >> spark model is built, we can export it to PMML format. In other words, we >> are using the serialized model in order to build the PMML model. >> > > That's great. > > If I have not mistaken what you are suggesting is let the user go through >> the normal workflow of model building and once it is done, give an option >> to the user to export it to PMML format(also for the models that have been >> already built)? >> > Yes exactly! What we should not do IMO is asking the user to go through the whole workflow if he needs to export already created model in PMML. > > Yes, this is exactly what I meant. > > >> @Vidura I will check on the run-time support, if that is possible that >> would be great. >> > > If it's supported, it'll be great. If not we can still do it based on the > model type but I think it'll be a bit messy as the code wouldn't be as > generic. > > > Thanks and Regards, > > Vidura > > > >> >> On Mon, Oct 12, 2015 at 12:10 PM, Vidura Gamini Abhaya <[email protected]> >> wrote: >> >>> Hi Fazlan, >>> >>> Are you planning to build a PMML model from the scratch (i.e going >>> through the entire flow to build an ML model) or is this to be used for >>> exporting a PMML out of an already built model? >>> >>> If it's the former, +1 to what CD mentioned on not asking user to go >>> through the entire ML workflow for PMML. My preference is also for >>> saving/exporting a model in PMML to be an option for the user, once a model >>> is built and for models that have already been built. >>> >>> @Fazlan - Can we find out whether the PMML export is possible at runtime >>> through a method or through the inheritance hierarchy? If so, we could only >>> make the export option visible on a UI, only for supported models. >>> >>> Thanks and Regards, >>> >>> Vidura >>> >>> >>> >>> On 12 October 2015 at 11:33, CD Athuraliya <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> I feel that asking user to go through the complete ML workflow for PMML >>>> is too demanding. Computationally this conversion should be less expensive >>>> compared to model training in real world use cases (since it's a mapping of >>>> model parameters from Java objects to XML AFAIK). And model training should >>>> be independent from the model format. Instead can't we support this >>>> conversion on demand? Or save in both formats for now? Once Spark starts >>>> supporting PMML for all algorithms we can go for Method 1 if it looks >>>> consistent through out our ML life cycle. >>>> >>>> Thanks >>>> >>>> On Mon, Oct 12, 2015 at 11:09 AM, Fazlan Nazeem <[email protected]> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am working on redmine[1] regarding PMML support for Machine Learner. >>>>> Please provide your opinion on this design. >>>>> [1]https://redmine.wso2.com/issues/4303 >>>>> >>>>> *Overview* >>>>> >>>>> Spark 1.5.1(lastest version) supports PMML model export for some of >>>>> the available models in Spark through MLlib. >>>>> >>>>> The table below outlines the MLlib models that can be exported to PMML >>>>> and their equivalent PMML model. >>>>> >>>>> >>>>> >>>>> MLlib model >>>>> >>>>> PMML model >>>>> >>>>> KMeansModel >>>>> >>>>> ClusteringModel >>>>> >>>>> LinearRegressionModel >>>>> >>>>> RegressionModel (functionName="regression") >>>>> >>>>> RidgeRegressionModel >>>>> >>>>> RegressionModel (functionName="regression") >>>>> >>>>> LassoModel >>>>> >>>>> RegressionModel (functionName="regression") >>>>> >>>>> SVMModel >>>>> >>>>> RegressionModel (functionName="classification" >>>>> normalizationMethod="none") >>>>> >>>>> Binary LogisticRegressionModel >>>>> >>>>> RegressionModel (functionName="classification" >>>>> normalizationMethod="logit") >>>>> >>>>> Not all models available in MLlib can be exported to PMML as of now. >>>>> Goal >>>>> >>>>> 1. >>>>> >>>>> We need to save models generated by WSO2 ML(PMML supported models) >>>>> in PMML format, so that those could be reused from PMML supported >>>>> tools. >>>>> >>>>> How To >>>>> >>>>> if “clusters” is the trained model, we can do the following with the >>>>> PMML support. >>>>> >>>>> // Export the model to a String in PMML format >>>>> clusters.toPMML >>>>> >>>>> // Export the model to a local file in PMML format >>>>> clusters.toPMML("/tmp/kmeans.xml") >>>>> >>>>> // Export the model to a directory on a distributed file system in >>>>> PMML format >>>>> clusters.toPMML(sc,"/tmp/kmeans") >>>>> >>>>> // Export the model to the OutputStream in PMML format >>>>> clusters.toPMML(System.out) >>>>> >>>>> For unsupported models, either you will not find a .toPMML method or >>>>> an IllegalArgumentException will be thrown. >>>>> Design >>>>> >>>>> In the following diagram models highlighted in green can be exported >>>>> to PMML, but not the models highlighted in red. The diagram illustrates >>>>> algorithms supported by WSO2 Machine Learner. >>>>> >>>>> [image: Inline image 2] >>>>> >>>>> >>>>> Method 1 >>>>> >>>>> By default save the models in PMML if PMML export is supported, using >>>>> one of these supported options. >>>>> >>>>> 1. Export the model to a String in PMML format >>>>> 2. Export the model to a local file in PMML format >>>>> 3. Export the model to a directory on a distributed file system in >>>>> PMML format >>>>> 4 . Export the model to the OutputStream in PMML format >>>>> >>>>> Classes need to be modified (apart from UI) >>>>> >>>>> - >>>>> >>>>> SupervisedSparkModelBuilder >>>>> - >>>>> >>>>> UnsupervisedSparkModelBuilder >>>>> >>>>> >>>>> e.g >>>>> >>>>> [image: Inline image 1] >>>>> >>>>> As of now the serialized models are saved in “models” folder. The PMML >>>>> models can also be saved in the same directory with a PMML suffix. >>>>> >>>>> optional: >>>>> >>>>> After the model is generated let the user export the PMML model to a >>>>> chosen location through the UI. >>>>> >>>>> Method 2 >>>>> >>>>> Add a *new REST API* to build models with PMML >>>>> >>>>> public Response buildPMMLModel(@PathParam("modelId") long modelId) >>>>> >>>>> in the backend we could add an additional argument to "buildXModel" >>>>> methods to decide whether to save the PMML model or not. >>>>> >>>>> UI modifications also needed (An option for the user to choose whether >>>>> to build the PMML and to choose the path to save it) >>>>> >>>>> Identified classes need to be modified (apart from UI) >>>>> >>>>> - >>>>> >>>>> SupervisedSparkModelBuilder >>>>> - >>>>> >>>>> UnsupervisedSparkModelBuilder >>>>> - >>>>> >>>>> ModelApiV10 >>>>> >>>>> >>>>> >>>>> *Conclusion* >>>>> >>>>> Currently we have decided to go with "Method 2" because of the >>>>> following reasons. >>>>> >>>>> - Not all models have PMML support in Spark. >>>>> - If we are to use anything apart from Spark MLlib, such as H2O, >>>>> we will be depending on PMML support from H2O. >>>>> - With Method 1 we might be generating PMML models when users are >>>>> not in need of it (useless computation). >>>>> >>>>> Please let me know if there is a better way to improve the design. >>>>> >>>>> -- >>>>> Thanks & Regards, >>>>> >>>>> Fazlan Nazeem >>>>> >>>>> *Software Engineer* >>>>> >>>>> *WSO2 Inc* >>>>> Mobile : +94772338839 >>>>> <%2B94%20%280%29%20773%20451194> >>>>> [email protected] >>>>> >>>> >>>> >>>> >>>> -- >>>> *CD Athuraliya* >>>> Software Engineer >>>> WSO2, Inc. >>>> lean . enterprise . middleware >>>> Mobile: +94 716288847 <94716288847> >>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>> <https://twitter.com/cdathuraliya> | Blog >>>> <https://cdathuraliya.wordpress.com/> >>>> >>> >>> >>> >>> -- >>> Vidura Gamini Abhaya, Ph.D. >>> Director of Engineering >>> M:+94 77 034 7754 >>> E: [email protected] >>> >>> WSO2 Inc. (http://wso2.com) >>> lean.enterprise.middleware >>> >> >> >> >> -- >> Thanks & Regards, >> >> Fazlan Nazeem >> >> *Software Engineer* >> >> *WSO2 Inc* >> Mobile : +94772338839 >> <%2B94%20%280%29%20773%20451194> >> [email protected] >> > > > > -- > Vidura Gamini Abhaya, Ph.D. > Director of Engineering > M:+94 77 034 7754 > E: [email protected] > > WSO2 Inc. (http://wso2.com) > lean.enterprise.middleware > -- *CD Athuraliya* Software Engineer WSO2, Inc. lean . enterprise . middleware Mobile: +94 716288847 <94716288847> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter <https://twitter.com/cdathuraliya> | Blog <https://cdathuraliya.wordpress.com/>
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
