Hi,

I feel that asking user to go through the complete ML workflow for PMML is
too demanding. Computationally this conversion should be less expensive
compared to model training in real world use cases (since it's a mapping of
model parameters from Java objects to XML AFAIK). And model training should
be independent from the model format. Instead can't we support this
conversion on demand? Or save in both formats for now? Once Spark starts
supporting PMML for all algorithms we can go for Method 1 if it looks
consistent through out our ML life cycle.

Thanks

On Mon, Oct 12, 2015 at 11:09 AM, Fazlan Nazeem <[email protected]> wrote:

> Hi,
>
> I am working on redmine[1] regarding PMML support for Machine Learner.
> Please provide your opinion on this design.
> [1]https://redmine.wso2.com/issues/4303
>
> *Overview*
>
> Spark 1.5.1(lastest version) supports PMML model export for some of the
> available models in Spark through MLlib.
>
> The table below outlines the MLlib models that can be exported to PMML and
> their equivalent PMML model.
>
>
>
> MLlib model
>
> PMML model
>
> KMeansModel
>
> ClusteringModel
>
> LinearRegressionModel
>
> RegressionModel (functionName="regression")
>
> RidgeRegressionModel
>
> RegressionModel (functionName="regression")
>
> LassoModel
>
> RegressionModel (functionName="regression")
>
> SVMModel
>
> RegressionModel (functionName="classification" normalizationMethod="none")
>
> Binary LogisticRegressionModel
>
> RegressionModel (functionName="classification" normalizationMethod="logit")
>
> Not all models available in MLlib can be exported to PMML as of now.
> Goal
>
>    1.
>
>    We need to save models generated by WSO2 ML(PMML supported models) in
>    PMML format, so that those could be reused from PMML supported tools.
>
> How To
>
> if “clusters” is the trained model, we can do the following with the PMML
> support.
>
> // Export the model to a String in PMML format
> clusters.toPMML
>
> // Export the model to a local file in PMML format
> clusters.toPMML("/tmp/kmeans.xml")
>
> // Export the model to a directory on a distributed file system in PMML
> format
> clusters.toPMML(sc,"/tmp/kmeans")
>
> // Export the model to the OutputStream in PMML format
> clusters.toPMML(System.out)
>
> For unsupported models, either you will not find a .toPMML method or an
> IllegalArgumentException will be thrown.
> Design
>
> In the following diagram models highlighted in green can be exported to
> PMML, but not the models highlighted in red. The diagram illustrates
> algorithms supported by WSO2 Machine Learner.
>
> [image: Inline image 2]
> ​
>
> Method 1
>
> By default save the models in PMML if PMML export is supported, using one
> of these supported options.
>
> 1.  Export the model to a String in PMML format
> 2.  Export the model to a local file in PMML format
> 3.  Export the model to a directory on a distributed file system in PMML
> format
> 4 . Export the model to the OutputStream in PMML format
>
> Classes need to be modified (apart from UI)
>
>    -
>
>    SupervisedSparkModelBuilder
>    -
>
>    UnsupervisedSparkModelBuilder
>
>
> e.g
>
> [image: Inline image 1]
>
> As of now the serialized models are saved in “models” folder. The PMML
> models can also be saved in the same directory with a PMML suffix.
>
> optional:
>
> After the model is generated let the user export the PMML model to a
> chosen location through the UI.
>
> Method 2
>
> Add a *new REST API* to build models with PMML
>
> public Response buildPMMLModel(@PathParam("modelId") long modelId)
>
> in the backend we could add an additional argument to "buildXModel"
> methods to decide whether to save the PMML model or not.
>
> UI modifications also needed (An option for the user to choose whether to
> build the PMML and to choose the path to save it)
>
> Identified classes need to be modified (apart from UI)
>
>    -
>
>    SupervisedSparkModelBuilder
>    -
>
>    UnsupervisedSparkModelBuilder
>    -
>
>    ModelApiV10
>
>
>
> *Conclusion*
>
> Currently we have decided to go with "Method 2" because of the following
> reasons.
>
>    - Not all models have PMML support in Spark.
>    - If we are to use anything apart from Spark MLlib, such as H2O, we
>    will be depending on PMML support from H2O.
>    - With Method 1 we might be generating PMML models when users are not
>    in need of it (useless computation).
>
>  Please let me know if there is a better way to improve the design.
>
> --
> Thanks & Regards,
>
> Fazlan Nazeem
>
> *Software Engineer*
>
> *WSO2 Inc*
> Mobile : +94772338839
> <%2B94%20%280%29%20773%20451194>
> [email protected]
>



-- 
*CD Athuraliya*
Software Engineer
WSO2, Inc.
lean . enterprise . middleware
Mobile: +94 716288847 <94716288847>
LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
<https://twitter.com/cdathuraliya> | Blog
<https://cdathuraliya.wordpress.com/>
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to