Hi CD/Vidura,

On Mon, Oct 12, 2015 at 1:56 PM, CD Athuraliya <[email protected]> wrote:

>
>
> On Mon, Oct 12, 2015 at 12:36 PM, Vidura Gamini Abhaya <[email protected]>
> wrote:
>
>> Hi Fazlan,
>>
>> Please see my comments inline in blue.
>>
>>>
>>> No I am not planning to build the model from scratch. Once the
>>> serialized spark model is built, we can export it to PMML format. In other
>>> words, we are using the serialized model in order to build the PMML model.
>>>
>>
>> That's great.
>>
>> If I have not mistaken what you are suggesting is let the user go through
>>> the normal workflow of model building and once it is done, give an option
>>> to the user to export it to PMML format(also for the models that have been
>>> already built)?
>>>
>>
> Yes exactly! What we should not do IMO is asking the user to go through
> the whole workflow if he needs to export already created model in PMML.
>

Can you please explain from where did you get this idea? If this idea is
there in Fazlan's content, we need to fix it.


>> Yes, this is exactly what I meant.
>>
>>
>>> @Vidura I will check on the run-time support, if that is possible that
>>> would be great.
>>>
>>
>> If it's supported, it'll be great. If not we can still do it based on the
>> model type but I think it'll be a bit messy as the code wouldn't be as
>> generic.
>>
>>
>> Thanks and Regards,
>>
>> Vidura
>>
>>
>>
>>>
>>> On Mon, Oct 12, 2015 at 12:10 PM, Vidura Gamini Abhaya <[email protected]>
>>> wrote:
>>>
>>>> Hi Fazlan,
>>>>
>>>> Are you planning to build a PMML model from the scratch (i.e going
>>>> through the entire flow to build an ML model) or is this to be used for
>>>> exporting a PMML out of an already built model?
>>>>
>>>> If it's the former, +1 to what CD mentioned on not asking user to go
>>>> through the entire ML workflow for PMML. My preference is also for
>>>> saving/exporting a model in PMML to be an option for the user, once a model
>>>> is built and for models that have already been built.
>>>>
>>>> @Fazlan - Can we find out whether the PMML export is possible at
>>>> runtime through a method or through the inheritance hierarchy? If so, we
>>>> could only make the export option visible on a UI, only for supported
>>>> models.
>>>>
>>>> Thanks and Regards,
>>>>
>>>> Vidura
>>>>
>>>>
>>>>
>>>> On 12 October 2015 at 11:33, CD Athuraliya <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I feel that asking user to go through the complete ML workflow for
>>>>> PMML is too demanding. Computationally this conversion should be less
>>>>> expensive compared to model training in real world use cases (since it's a
>>>>> mapping of model parameters from Java objects to XML AFAIK). And model
>>>>> training should be independent from the model format. Instead can't we
>>>>> support this conversion on demand? Or save in both formats for now? Once
>>>>> Spark starts supporting PMML for all algorithms we can go for Method 1 if
>>>>> it looks consistent through out our ML life cycle.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Mon, Oct 12, 2015 at 11:09 AM, Fazlan Nazeem <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am working on redmine[1] regarding PMML support for Machine
>>>>>> Learner. Please provide your opinion on this design.
>>>>>> [1]https://redmine.wso2.com/issues/4303
>>>>>>
>>>>>> *Overview*
>>>>>>
>>>>>> Spark 1.5.1(lastest version) supports PMML model export for some of
>>>>>> the available models in Spark through MLlib.
>>>>>>
>>>>>> The table below outlines the MLlib models that can be exported to
>>>>>> PMML and their equivalent PMML model.
>>>>>>
>>>>>>
>>>>>>
>>>>>> MLlib model
>>>>>>
>>>>>> PMML model
>>>>>>
>>>>>> KMeansModel
>>>>>>
>>>>>> ClusteringModel
>>>>>>
>>>>>> LinearRegressionModel
>>>>>>
>>>>>> RegressionModel (functionName="regression")
>>>>>>
>>>>>> RidgeRegressionModel
>>>>>>
>>>>>> RegressionModel (functionName="regression")
>>>>>>
>>>>>> LassoModel
>>>>>>
>>>>>> RegressionModel (functionName="regression")
>>>>>>
>>>>>> SVMModel
>>>>>>
>>>>>> RegressionModel (functionName="classification"
>>>>>> normalizationMethod="none")
>>>>>>
>>>>>> Binary LogisticRegressionModel
>>>>>>
>>>>>> RegressionModel (functionName="classification"
>>>>>> normalizationMethod="logit")
>>>>>>
>>>>>> Not all models available in MLlib can be exported to PMML as of now.
>>>>>> Goal
>>>>>>
>>>>>>    1.
>>>>>>
>>>>>>    We need to save models generated by WSO2 ML(PMML supported
>>>>>>    models) in PMML format, so that those could be reused from PMML 
>>>>>> supported
>>>>>>    tools.
>>>>>>
>>>>>> How To
>>>>>>
>>>>>> if “clusters” is the trained model, we can do the following with the
>>>>>> PMML support.
>>>>>>
>>>>>> // Export the model to a String in PMML format
>>>>>> clusters.toPMML
>>>>>>
>>>>>> // Export the model to a local file in PMML format
>>>>>> clusters.toPMML("/tmp/kmeans.xml")
>>>>>>
>>>>>> // Export the model to a directory on a distributed file system in
>>>>>> PMML format
>>>>>> clusters.toPMML(sc,"/tmp/kmeans")
>>>>>>
>>>>>> // Export the model to the OutputStream in PMML format
>>>>>> clusters.toPMML(System.out)
>>>>>>
>>>>>> For unsupported models, either you will not find a .toPMML method or
>>>>>> an IllegalArgumentException will be thrown.
>>>>>> Design
>>>>>>
>>>>>> In the following diagram models highlighted in green can be exported
>>>>>> to PMML, but not the models highlighted in red. The diagram illustrates
>>>>>> algorithms supported by WSO2 Machine Learner.
>>>>>>
>>>>>> [image: Inline image 2]
>>>>>> ​
>>>>>>
>>>>>> Method 1
>>>>>>
>>>>>> By default save the models in PMML if PMML export is supported, using
>>>>>> one of these supported options.
>>>>>>
>>>>>> 1.  Export the model to a String in PMML format
>>>>>> 2.  Export the model to a local file in PMML format
>>>>>> 3.  Export the model to a directory on a distributed file system in
>>>>>> PMML format
>>>>>> 4 . Export the model to the OutputStream in PMML format
>>>>>>
>>>>>> Classes need to be modified (apart from UI)
>>>>>>
>>>>>>    -
>>>>>>
>>>>>>    SupervisedSparkModelBuilder
>>>>>>    -
>>>>>>
>>>>>>    UnsupervisedSparkModelBuilder
>>>>>>
>>>>>>
>>>>>> e.g
>>>>>>
>>>>>> [image: Inline image 1]
>>>>>>
>>>>>> As of now the serialized models are saved in “models” folder. The
>>>>>> PMML models can also be saved in the same directory with a PMML suffix.
>>>>>>
>>>>>> optional:
>>>>>>
>>>>>> After the model is generated let the user export the PMML model to a
>>>>>> chosen location through the UI.
>>>>>>
>>>>>> Method 2
>>>>>>
>>>>>> Add a *new REST API* to build models with PMML
>>>>>>
>>>>>> public Response buildPMMLModel(@PathParam("modelId") long modelId)
>>>>>>
>>>>>> in the backend we could add an additional argument to "buildXModel"
>>>>>> methods to decide whether to save the PMML model or not.
>>>>>>
>>>>>> UI modifications also needed (An option for the user to choose
>>>>>> whether to build the PMML and to choose the path to save it)
>>>>>>
>>>>>> Identified classes need to be modified (apart from UI)
>>>>>>
>>>>>>    -
>>>>>>
>>>>>>    SupervisedSparkModelBuilder
>>>>>>    -
>>>>>>
>>>>>>    UnsupervisedSparkModelBuilder
>>>>>>    -
>>>>>>
>>>>>>    ModelApiV10
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Conclusion*
>>>>>>
>>>>>> Currently we have decided to go with "Method 2" because of the
>>>>>> following reasons.
>>>>>>
>>>>>>    - Not all models have PMML support in Spark.
>>>>>>    - If we are to use anything apart from Spark MLlib, such as H2O,
>>>>>>    we will be depending on PMML support from H2O.
>>>>>>    - With Method 1 we might be generating PMML models when users are
>>>>>>    not in need of it (useless computation).
>>>>>>
>>>>>>  Please let me know if there is a better way to improve the design.
>>>>>>
>>>>>> --
>>>>>> Thanks & Regards,
>>>>>>
>>>>>> Fazlan Nazeem
>>>>>>
>>>>>> *Software Engineer*
>>>>>>
>>>>>> *WSO2 Inc*
>>>>>> Mobile : +94772338839
>>>>>> <%2B94%20%280%29%20773%20451194>
>>>>>> [email protected]
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *CD Athuraliya*
>>>>> Software Engineer
>>>>> WSO2, Inc.
>>>>> lean . enterprise . middleware
>>>>> Mobile: +94 716288847 <94716288847>
>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>> <https://cdathuraliya.wordpress.com/>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Vidura Gamini Abhaya, Ph.D.
>>>> Director of Engineering
>>>> M:+94 77 034 7754
>>>> E: [email protected]
>>>>
>>>> WSO2 Inc. (http://wso2.com)
>>>> lean.enterprise.middleware
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks & Regards,
>>>
>>> Fazlan Nazeem
>>>
>>> *Software Engineer*
>>>
>>> *WSO2 Inc*
>>> Mobile : +94772338839
>>> <%2B94%20%280%29%20773%20451194>
>>> [email protected]
>>>
>>
>>
>>
>> --
>> Vidura Gamini Abhaya, Ph.D.
>> Director of Engineering
>> M:+94 77 034 7754
>> E: [email protected]
>>
>> WSO2 Inc. (http://wso2.com)
>> lean.enterprise.middleware
>>
>
>
>
> --
> *CD Athuraliya*
> Software Engineer
> WSO2, Inc.
> lean . enterprise . middleware
> Mobile: +94 716288847 <94716288847>
> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
> <https://twitter.com/cdathuraliya> | Blog
> <https://cdathuraliya.wordpress.com/>
>



-- 

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to