Hi CD/Vidura, On Mon, Oct 12, 2015 at 1:56 PM, CD Athuraliya <[email protected]> wrote:
> > > On Mon, Oct 12, 2015 at 12:36 PM, Vidura Gamini Abhaya <[email protected]> > wrote: > >> Hi Fazlan, >> >> Please see my comments inline in blue. >> >>> >>> No I am not planning to build the model from scratch. Once the >>> serialized spark model is built, we can export it to PMML format. In other >>> words, we are using the serialized model in order to build the PMML model. >>> >> >> That's great. >> >> If I have not mistaken what you are suggesting is let the user go through >>> the normal workflow of model building and once it is done, give an option >>> to the user to export it to PMML format(also for the models that have been >>> already built)? >>> >> > Yes exactly! What we should not do IMO is asking the user to go through > the whole workflow if he needs to export already created model in PMML. > Can you please explain from where did you get this idea? If this idea is there in Fazlan's content, we need to fix it. >> Yes, this is exactly what I meant. >> >> >>> @Vidura I will check on the run-time support, if that is possible that >>> would be great. >>> >> >> If it's supported, it'll be great. If not we can still do it based on the >> model type but I think it'll be a bit messy as the code wouldn't be as >> generic. >> >> >> Thanks and Regards, >> >> Vidura >> >> >> >>> >>> On Mon, Oct 12, 2015 at 12:10 PM, Vidura Gamini Abhaya <[email protected]> >>> wrote: >>> >>>> Hi Fazlan, >>>> >>>> Are you planning to build a PMML model from the scratch (i.e going >>>> through the entire flow to build an ML model) or is this to be used for >>>> exporting a PMML out of an already built model? >>>> >>>> If it's the former, +1 to what CD mentioned on not asking user to go >>>> through the entire ML workflow for PMML. My preference is also for >>>> saving/exporting a model in PMML to be an option for the user, once a model >>>> is built and for models that have already been built. >>>> >>>> @Fazlan - Can we find out whether the PMML export is possible at >>>> runtime through a method or through the inheritance hierarchy? If so, we >>>> could only make the export option visible on a UI, only for supported >>>> models. >>>> >>>> Thanks and Regards, >>>> >>>> Vidura >>>> >>>> >>>> >>>> On 12 October 2015 at 11:33, CD Athuraliya <[email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> I feel that asking user to go through the complete ML workflow for >>>>> PMML is too demanding. Computationally this conversion should be less >>>>> expensive compared to model training in real world use cases (since it's a >>>>> mapping of model parameters from Java objects to XML AFAIK). And model >>>>> training should be independent from the model format. Instead can't we >>>>> support this conversion on demand? Or save in both formats for now? Once >>>>> Spark starts supporting PMML for all algorithms we can go for Method 1 if >>>>> it looks consistent through out our ML life cycle. >>>>> >>>>> Thanks >>>>> >>>>> On Mon, Oct 12, 2015 at 11:09 AM, Fazlan Nazeem <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am working on redmine[1] regarding PMML support for Machine >>>>>> Learner. Please provide your opinion on this design. >>>>>> [1]https://redmine.wso2.com/issues/4303 >>>>>> >>>>>> *Overview* >>>>>> >>>>>> Spark 1.5.1(lastest version) supports PMML model export for some of >>>>>> the available models in Spark through MLlib. >>>>>> >>>>>> The table below outlines the MLlib models that can be exported to >>>>>> PMML and their equivalent PMML model. >>>>>> >>>>>> >>>>>> >>>>>> MLlib model >>>>>> >>>>>> PMML model >>>>>> >>>>>> KMeansModel >>>>>> >>>>>> ClusteringModel >>>>>> >>>>>> LinearRegressionModel >>>>>> >>>>>> RegressionModel (functionName="regression") >>>>>> >>>>>> RidgeRegressionModel >>>>>> >>>>>> RegressionModel (functionName="regression") >>>>>> >>>>>> LassoModel >>>>>> >>>>>> RegressionModel (functionName="regression") >>>>>> >>>>>> SVMModel >>>>>> >>>>>> RegressionModel (functionName="classification" >>>>>> normalizationMethod="none") >>>>>> >>>>>> Binary LogisticRegressionModel >>>>>> >>>>>> RegressionModel (functionName="classification" >>>>>> normalizationMethod="logit") >>>>>> >>>>>> Not all models available in MLlib can be exported to PMML as of now. >>>>>> Goal >>>>>> >>>>>> 1. >>>>>> >>>>>> We need to save models generated by WSO2 ML(PMML supported >>>>>> models) in PMML format, so that those could be reused from PMML >>>>>> supported >>>>>> tools. >>>>>> >>>>>> How To >>>>>> >>>>>> if “clusters” is the trained model, we can do the following with the >>>>>> PMML support. >>>>>> >>>>>> // Export the model to a String in PMML format >>>>>> clusters.toPMML >>>>>> >>>>>> // Export the model to a local file in PMML format >>>>>> clusters.toPMML("/tmp/kmeans.xml") >>>>>> >>>>>> // Export the model to a directory on a distributed file system in >>>>>> PMML format >>>>>> clusters.toPMML(sc,"/tmp/kmeans") >>>>>> >>>>>> // Export the model to the OutputStream in PMML format >>>>>> clusters.toPMML(System.out) >>>>>> >>>>>> For unsupported models, either you will not find a .toPMML method or >>>>>> an IllegalArgumentException will be thrown. >>>>>> Design >>>>>> >>>>>> In the following diagram models highlighted in green can be exported >>>>>> to PMML, but not the models highlighted in red. The diagram illustrates >>>>>> algorithms supported by WSO2 Machine Learner. >>>>>> >>>>>> [image: Inline image 2] >>>>>> >>>>>> >>>>>> Method 1 >>>>>> >>>>>> By default save the models in PMML if PMML export is supported, using >>>>>> one of these supported options. >>>>>> >>>>>> 1. Export the model to a String in PMML format >>>>>> 2. Export the model to a local file in PMML format >>>>>> 3. Export the model to a directory on a distributed file system in >>>>>> PMML format >>>>>> 4 . Export the model to the OutputStream in PMML format >>>>>> >>>>>> Classes need to be modified (apart from UI) >>>>>> >>>>>> - >>>>>> >>>>>> SupervisedSparkModelBuilder >>>>>> - >>>>>> >>>>>> UnsupervisedSparkModelBuilder >>>>>> >>>>>> >>>>>> e.g >>>>>> >>>>>> [image: Inline image 1] >>>>>> >>>>>> As of now the serialized models are saved in “models” folder. The >>>>>> PMML models can also be saved in the same directory with a PMML suffix. >>>>>> >>>>>> optional: >>>>>> >>>>>> After the model is generated let the user export the PMML model to a >>>>>> chosen location through the UI. >>>>>> >>>>>> Method 2 >>>>>> >>>>>> Add a *new REST API* to build models with PMML >>>>>> >>>>>> public Response buildPMMLModel(@PathParam("modelId") long modelId) >>>>>> >>>>>> in the backend we could add an additional argument to "buildXModel" >>>>>> methods to decide whether to save the PMML model or not. >>>>>> >>>>>> UI modifications also needed (An option for the user to choose >>>>>> whether to build the PMML and to choose the path to save it) >>>>>> >>>>>> Identified classes need to be modified (apart from UI) >>>>>> >>>>>> - >>>>>> >>>>>> SupervisedSparkModelBuilder >>>>>> - >>>>>> >>>>>> UnsupervisedSparkModelBuilder >>>>>> - >>>>>> >>>>>> ModelApiV10 >>>>>> >>>>>> >>>>>> >>>>>> *Conclusion* >>>>>> >>>>>> Currently we have decided to go with "Method 2" because of the >>>>>> following reasons. >>>>>> >>>>>> - Not all models have PMML support in Spark. >>>>>> - If we are to use anything apart from Spark MLlib, such as H2O, >>>>>> we will be depending on PMML support from H2O. >>>>>> - With Method 1 we might be generating PMML models when users are >>>>>> not in need of it (useless computation). >>>>>> >>>>>> Please let me know if there is a better way to improve the design. >>>>>> >>>>>> -- >>>>>> Thanks & Regards, >>>>>> >>>>>> Fazlan Nazeem >>>>>> >>>>>> *Software Engineer* >>>>>> >>>>>> *WSO2 Inc* >>>>>> Mobile : +94772338839 >>>>>> <%2B94%20%280%29%20773%20451194> >>>>>> [email protected] >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *CD Athuraliya* >>>>> Software Engineer >>>>> WSO2, Inc. >>>>> lean . enterprise . middleware >>>>> Mobile: +94 716288847 <94716288847> >>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>> <https://twitter.com/cdathuraliya> | Blog >>>>> <https://cdathuraliya.wordpress.com/> >>>>> >>>> >>>> >>>> >>>> -- >>>> Vidura Gamini Abhaya, Ph.D. >>>> Director of Engineering >>>> M:+94 77 034 7754 >>>> E: [email protected] >>>> >>>> WSO2 Inc. (http://wso2.com) >>>> lean.enterprise.middleware >>>> >>> >>> >>> >>> -- >>> Thanks & Regards, >>> >>> Fazlan Nazeem >>> >>> *Software Engineer* >>> >>> *WSO2 Inc* >>> Mobile : +94772338839 >>> <%2B94%20%280%29%20773%20451194> >>> [email protected] >>> >> >> >> >> -- >> Vidura Gamini Abhaya, Ph.D. >> Director of Engineering >> M:+94 77 034 7754 >> E: [email protected] >> >> WSO2 Inc. (http://wso2.com) >> lean.enterprise.middleware >> > > > > -- > *CD Athuraliya* > Software Engineer > WSO2, Inc. > lean . enterprise . middleware > Mobile: +94 716288847 <94716288847> > LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter > <https://twitter.com/cdathuraliya> | Blog > <https://cdathuraliya.wordpress.com/> > -- Thanks & regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
