Hi,
I am working on redmine[1] regarding PMML support for Machine Learner.
Please provide your opinion on this design.
[1]https://redmine.wso2.com/issues/4303
*Overview*
Spark 1.5.1(lastest version) supports PMML model export for some of the
available models in Spark through MLlib.
The table below outlines the MLlib models that can be exported to PMML and
their equivalent PMML model.
MLlib model
PMML model
KMeansModel
ClusteringModel
LinearRegressionModel
RegressionModel (functionName="regression")
RidgeRegressionModel
RegressionModel (functionName="regression")
LassoModel
RegressionModel (functionName="regression")
SVMModel
RegressionModel (functionName="classification" normalizationMethod="none")
Binary LogisticRegressionModel
RegressionModel (functionName="classification" normalizationMethod="logit")
Not all models available in MLlib can be exported to PMML as of now.
Goal
1.
We need to save models generated by WSO2 ML(PMML supported models) in
PMML format, so that those could be reused from PMML supported tools.
How To
if “clusters” is the trained model, we can do the following with the PMML
support.
// Export the model to a String in PMML format
clusters.toPMML
// Export the model to a local file in PMML format
clusters.toPMML("/tmp/kmeans.xml")
// Export the model to a directory on a distributed file system in PMML
format
clusters.toPMML(sc,"/tmp/kmeans")
// Export the model to the OutputStream in PMML format
clusters.toPMML(System.out)
For unsupported models, either you will not find a .toPMML method or an
IllegalArgumentException will be thrown.
Design
In the following diagram models highlighted in green can be exported to
PMML, but not the models highlighted in red. The diagram illustrates
algorithms supported by WSO2 Machine Learner.
[image: Inline image 2]
Method 1
By default save the models in PMML if PMML export is supported, using one
of these supported options.
1. Export the model to a String in PMML format
2. Export the model to a local file in PMML format
3. Export the model to a directory on a distributed file system in PMML
format
4 . Export the model to the OutputStream in PMML format
Classes need to be modified (apart from UI)
-
SupervisedSparkModelBuilder
-
UnsupervisedSparkModelBuilder
e.g
[image: Inline image 1]
As of now the serialized models are saved in “models” folder. The PMML
models can also be saved in the same directory with a PMML suffix.
optional:
After the model is generated let the user export the PMML model to a chosen
location through the UI.
Method 2
Add a *new REST API* to build models with PMML
public Response buildPMMLModel(@PathParam("modelId") long modelId)
in the backend we could add an additional argument to "buildXModel" methods
to decide whether to save the PMML model or not.
UI modifications also needed (An option for the user to choose whether to
build the PMML and to choose the path to save it)
Identified classes need to be modified (apart from UI)
-
SupervisedSparkModelBuilder
-
UnsupervisedSparkModelBuilder
-
ModelApiV10
*Conclusion*
Currently we have decided to go with "Method 2" because of the following
reasons.
- Not all models have PMML support in Spark.
- If we are to use anything apart from Spark MLlib, such as H2O, we will
be depending on PMML support from H2O.
- With Method 1 we might be generating PMML models when users are not in
need of it (useless computation).
Please let me know if there is a better way to improve the design.
--
Thanks & Regards,
Fazlan Nazeem
*Software Engineer*
*WSO2 Inc*
Mobile : +94772338839
<%2B94%20%280%29%20773%20451194>
[email protected]
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture