Re: [Architecture] [ML] PMML support for Machine Learner

Vidura Gamini Abhaya Mon, 12 Oct 2015 00:08:00 -0700

Hi Fazlan,

Please see my comments inline in blue.


>
> No I am not planning to build the model from scratch. Once the serialized
> spark model is built, we can export it to PMML format. In other words, we
> are using the serialized model in order to build the PMML model.
>

That's great.

If I have not mistaken what you are suggesting is let the user go through
> the normal workflow of model building and once it is done, give an option
> to the user to export it to PMML format(also for the models that have been
> already built)?
>

Yes, this is exactly what I meant.


> @Vidura I will check on the run-time support, if that is possible that
> would be great.
>

If it's supported, it'll be great. If not we can still do it based on the
model type but I think it'll be a bit messy as the code wouldn't be as
generic.


Thanks and Regards,

Vidura



>
> On Mon, Oct 12, 2015 at 12:10 PM, Vidura Gamini Abhaya <[email protected]>
> wrote:
>
>> Hi Fazlan,
>>
>> Are you planning to build a PMML model from the scratch (i.e going
>> through the entire flow to build an ML model) or is this to be used for
>> exporting a PMML out of an already built model?
>>
>> If it's the former, +1 to what CD mentioned on not asking user to go
>> through the entire ML workflow for PMML. My preference is also for
>> saving/exporting a model in PMML to be an option for the user, once a model
>> is built and for models that have already been built.
>>
>> @Fazlan - Can we find out whether the PMML export is possible at runtime
>> through a method or through the inheritance hierarchy? If so, we could only
>> make the export option visible on a UI, only for supported models.
>>
>> Thanks and Regards,
>>
>> Vidura
>>
>>
>>
>> On 12 October 2015 at 11:33, CD Athuraliya <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I feel that asking user to go through the complete ML workflow for PMML
>>> is too demanding. Computationally this conversion should be less expensive
>>> compared to model training in real world use cases (since it's a mapping of
>>> model parameters from Java objects to XML AFAIK). And model training should
>>> be independent from the model format. Instead can't we support this
>>> conversion on demand? Or save in both formats for now? Once Spark starts
>>> supporting PMML for all algorithms we can go for Method 1 if it looks
>>> consistent through out our ML life cycle.
>>>
>>> Thanks
>>>
>>> On Mon, Oct 12, 2015 at 11:09 AM, Fazlan Nazeem <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am working on redmine[1] regarding PMML support for Machine Learner.
>>>> Please provide your opinion on this design.
>>>> [1]https://redmine.wso2.com/issues/4303
>>>>
>>>> *Overview*
>>>>
>>>> Spark 1.5.1(lastest version) supports PMML model export for some of the
>>>> available models in Spark through MLlib.
>>>>
>>>> The table below outlines the MLlib models that can be exported to PMML
>>>> and their equivalent PMML model.
>>>>
>>>>
>>>>
>>>> MLlib model
>>>>
>>>> PMML model
>>>>
>>>> KMeansModel
>>>>
>>>> ClusteringModel
>>>>
>>>> LinearRegressionModel
>>>>
>>>> RegressionModel (functionName="regression")
>>>>
>>>> RidgeRegressionModel
>>>>
>>>> RegressionModel (functionName="regression")
>>>>
>>>> LassoModel
>>>>
>>>> RegressionModel (functionName="regression")
>>>>
>>>> SVMModel
>>>>
>>>> RegressionModel (functionName="classification"
>>>> normalizationMethod="none")
>>>>
>>>> Binary LogisticRegressionModel
>>>>
>>>> RegressionModel (functionName="classification"
>>>> normalizationMethod="logit")
>>>>
>>>> Not all models available in MLlib can be exported to PMML as of now.
>>>> Goal
>>>>
>>>>    1.
>>>>
>>>>    We need to save models generated by WSO2 ML(PMML supported models)
>>>>    in PMML format, so that those could be reused from PMML supported tools.
>>>>
>>>> How To
>>>>
>>>> if “clusters” is the trained model, we can do the following with the
>>>> PMML support.
>>>>
>>>> // Export the model to a String in PMML format
>>>> clusters.toPMML
>>>>
>>>> // Export the model to a local file in PMML format
>>>> clusters.toPMML("/tmp/kmeans.xml")
>>>>
>>>> // Export the model to a directory on a distributed file system in PMML
>>>> format
>>>> clusters.toPMML(sc,"/tmp/kmeans")
>>>>
>>>> // Export the model to the OutputStream in PMML format
>>>> clusters.toPMML(System.out)
>>>>
>>>> For unsupported models, either you will not find a .toPMML method or an
>>>> IllegalArgumentException will be thrown.
>>>> Design
>>>>
>>>> In the following diagram models highlighted in green can be exported to
>>>> PMML, but not the models highlighted in red. The diagram illustrates
>>>> algorithms supported by WSO2 Machine Learner.
>>>>
>>>> [image: Inline image 2]
>>>> 
>>>>
>>>> Method 1
>>>>
>>>> By default save the models in PMML if PMML export is supported, using
>>>> one of these supported options.
>>>>
>>>> 1.  Export the model to a String in PMML format
>>>> 2.  Export the model to a local file in PMML format
>>>> 3.  Export the model to a directory on a distributed file system in
>>>> PMML format
>>>> 4 . Export the model to the OutputStream in PMML format
>>>>
>>>> Classes need to be modified (apart from UI)
>>>>
>>>>    -
>>>>
>>>>    SupervisedSparkModelBuilder
>>>>    -
>>>>
>>>>    UnsupervisedSparkModelBuilder
>>>>
>>>>
>>>> e.g
>>>>
>>>> [image: Inline image 1]
>>>>
>>>> As of now the serialized models are saved in “models” folder. The PMML
>>>> models can also be saved in the same directory with a PMML suffix.
>>>>
>>>> optional:
>>>>
>>>> After the model is generated let the user export the PMML model to a
>>>> chosen location through the UI.
>>>>
>>>> Method 2
>>>>
>>>> Add a *new REST API* to build models with PMML
>>>>
>>>> public Response buildPMMLModel(@PathParam("modelId") long modelId)
>>>>
>>>> in the backend we could add an additional argument to "buildXModel"
>>>> methods to decide whether to save the PMML model or not.
>>>>
>>>> UI modifications also needed (An option for the user to choose whether
>>>> to build the PMML and to choose the path to save it)
>>>>
>>>> Identified classes need to be modified (apart from UI)
>>>>
>>>>    -
>>>>
>>>>    SupervisedSparkModelBuilder
>>>>    -
>>>>
>>>>    UnsupervisedSparkModelBuilder
>>>>    -
>>>>
>>>>    ModelApiV10
>>>>
>>>>
>>>>
>>>> *Conclusion*
>>>>
>>>> Currently we have decided to go with "Method 2" because of the
>>>> following reasons.
>>>>
>>>>    - Not all models have PMML support in Spark.
>>>>    - If we are to use anything apart from Spark MLlib, such as H2O, we
>>>>    will be depending on PMML support from H2O.
>>>>    - With Method 1 we might be generating PMML models when users are
>>>>    not in need of it (useless computation).
>>>>
>>>>  Please let me know if there is a better way to improve the design.
>>>>
>>>> --
>>>> Thanks & Regards,
>>>>
>>>> Fazlan Nazeem
>>>>
>>>> *Software Engineer*
>>>>
>>>> *WSO2 Inc*
>>>> Mobile : +94772338839
>>>> <%2B94%20%280%29%20773%20451194>
>>>> [email protected]
>>>>
>>>
>>>
>>>
>>> --
>>> *CD Athuraliya*
>>> Software Engineer
>>> WSO2, Inc.
>>> lean . enterprise . middleware
>>> Mobile: +94 716288847 <94716288847>
>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>> <https://twitter.com/cdathuraliya> | Blog
>>> <https://cdathuraliya.wordpress.com/>
>>>
>>
>>
>>
>> --
>> Vidura Gamini Abhaya, Ph.D.
>> Director of Engineering
>> M:+94 77 034 7754
>> E: [email protected]
>>
>> WSO2 Inc. (http://wso2.com)
>> lean.enterprise.middleware
>>
>
>
>
> --
> Thanks & Regards,
>
> Fazlan Nazeem
>
> *Software Engineer*
>
> *WSO2 Inc*
> Mobile : +94772338839
> <%2B94%20%280%29%20773%20451194>
> [email protected]
>



-- 
Vidura Gamini Abhaya, Ph.D.
Director of Engineering
M:+94 77 034 7754
E: [email protected]

WSO2 Inc. (http://wso2.com)
lean.enterprise.middleware

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [ML] PMML support for Machine Learner

Reply via email to