Hi CD,

The idea was not to go through the complete ML workflow. We could add a
check-box to choose whether the PMML model is needed (within the normal ML
workflow).
Yes the computation is less expensive as you have mentioned, and that was
one reason why Method 1 was suggested.

Method 2 suggests on demand PMML building(Ask the user), and Method 1
suggests by default PMML building(save in both formats). Therefore we have
to decide which would be the best option to go in the current situation.


On Mon, Oct 12, 2015 at 11:33 AM, CD Athuraliya <[email protected]> wrote:

> Hi,
>
> I feel that asking user to go through the complete ML workflow for PMML is
> too demanding. Computationally this conversion should be less expensive
> compared to model training in real world use cases (since it's a mapping of
> model parameters from Java objects to XML AFAIK). And model training should
> be independent from the model format. Instead can't we support this
> conversion on demand? Or save in both formats for now? Once Spark starts
> supporting PMML for all algorithms we can go for Method 1 if it looks
> consistent through out our ML life cycle.
>
> Thanks
>
> On Mon, Oct 12, 2015 at 11:09 AM, Fazlan Nazeem <[email protected]> wrote:
>
>> Hi,
>>
>> I am working on redmine[1] regarding PMML support for Machine Learner.
>> Please provide your opinion on this design.
>> [1]https://redmine.wso2.com/issues/4303
>>
>> *Overview*
>>
>> Spark 1.5.1(lastest version) supports PMML model export for some of the
>> available models in Spark through MLlib.
>>
>> The table below outlines the MLlib models that can be exported to PMML
>> and their equivalent PMML model.
>>
>>
>>
>> MLlib model
>>
>> PMML model
>>
>> KMeansModel
>>
>> ClusteringModel
>>
>> LinearRegressionModel
>>
>> RegressionModel (functionName="regression")
>>
>> RidgeRegressionModel
>>
>> RegressionModel (functionName="regression")
>>
>> LassoModel
>>
>> RegressionModel (functionName="regression")
>>
>> SVMModel
>>
>> RegressionModel (functionName="classification" normalizationMethod="none")
>>
>> Binary LogisticRegressionModel
>>
>> RegressionModel (functionName="classification"
>> normalizationMethod="logit")
>>
>> Not all models available in MLlib can be exported to PMML as of now.
>> Goal
>>
>>    1.
>>
>>    We need to save models generated by WSO2 ML(PMML supported models) in
>>    PMML format, so that those could be reused from PMML supported tools.
>>
>> How To
>>
>> if “clusters” is the trained model, we can do the following with the PMML
>> support.
>>
>> // Export the model to a String in PMML format
>> clusters.toPMML
>>
>> // Export the model to a local file in PMML format
>> clusters.toPMML("/tmp/kmeans.xml")
>>
>> // Export the model to a directory on a distributed file system in PMML
>> format
>> clusters.toPMML(sc,"/tmp/kmeans")
>>
>> // Export the model to the OutputStream in PMML format
>> clusters.toPMML(System.out)
>>
>> For unsupported models, either you will not find a .toPMML method or an
>> IllegalArgumentException will be thrown.
>> Design
>>
>> In the following diagram models highlighted in green can be exported to
>> PMML, but not the models highlighted in red. The diagram illustrates
>> algorithms supported by WSO2 Machine Learner.
>>
>> [image: Inline image 2]
>> ​
>>
>> Method 1
>>
>> By default save the models in PMML if PMML export is supported, using one
>> of these supported options.
>>
>> 1.  Export the model to a String in PMML format
>> 2.  Export the model to a local file in PMML format
>> 3.  Export the model to a directory on a distributed file system in PMML
>> format
>> 4 . Export the model to the OutputStream in PMML format
>>
>> Classes need to be modified (apart from UI)
>>
>>    -
>>
>>    SupervisedSparkModelBuilder
>>    -
>>
>>    UnsupervisedSparkModelBuilder
>>
>>
>> e.g
>>
>> [image: Inline image 1]
>>
>> As of now the serialized models are saved in “models” folder. The PMML
>> models can also be saved in the same directory with a PMML suffix.
>>
>> optional:
>>
>> After the model is generated let the user export the PMML model to a
>> chosen location through the UI.
>>
>> Method 2
>>
>> Add a *new REST API* to build models with PMML
>>
>> public Response buildPMMLModel(@PathParam("modelId") long modelId)
>>
>> in the backend we could add an additional argument to "buildXModel"
>> methods to decide whether to save the PMML model or not.
>>
>> UI modifications also needed (An option for the user to choose whether to
>> build the PMML and to choose the path to save it)
>>
>> Identified classes need to be modified (apart from UI)
>>
>>    -
>>
>>    SupervisedSparkModelBuilder
>>    -
>>
>>    UnsupervisedSparkModelBuilder
>>    -
>>
>>    ModelApiV10
>>
>>
>>
>> *Conclusion*
>>
>> Currently we have decided to go with "Method 2" because of the following
>> reasons.
>>
>>    - Not all models have PMML support in Spark.
>>    - If we are to use anything apart from Spark MLlib, such as H2O, we
>>    will be depending on PMML support from H2O.
>>    - With Method 1 we might be generating PMML models when users are not
>>    in need of it (useless computation).
>>
>>  Please let me know if there is a better way to improve the design.
>>
>> --
>> Thanks & Regards,
>>
>> Fazlan Nazeem
>>
>> *Software Engineer*
>>
>> *WSO2 Inc*
>> Mobile : +94772338839
>> <%2B94%20%280%29%20773%20451194>
>> [email protected]
>>
>
>
>
> --
> *CD Athuraliya*
> Software Engineer
> WSO2, Inc.
> lean . enterprise . middleware
> Mobile: +94 716288847 <94716288847>
> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
> <https://twitter.com/cdathuraliya> | Blog
> <https://cdathuraliya.wordpress.com/>
>



-- 
Thanks & Regards,

Fazlan Nazeem

*Software Engineer*

*WSO2 Inc*
Mobile : +94772338839
<%2B94%20%280%29%20773%20451194>
[email protected]
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to