Hi Jayan,

Is there a way to output the accuracy of a specific model within a siddhi
execution plan?



On Wed, Nov 30, 2016 at 4:38 PM, Jayan Vidanapathirana <[email protected]>
wrote:

> Hi,
>
>
> I am one of the interns working on the "Streaming Machine Learning on WSO2
> CEP" Project. I have built a Siddhi extension to CEP using Apache SAMOA
> machine learning.
>
> “SAMOA (Scalable Advanced Massive Online Analysis) is a platform for
> mining big data streams. Currently, this is an apache incubator
> project.Samoa is written in Java  and it is open source, and available at
> http://samoa-project.net under the Apache Software License version 2.0.
>
> As a framework : it allows algorithm developers to abstract from the
> underlying execution engine, and therefore reuse their code on different
> engines. It features a pluggable architecture that allows it to run on
> several distributed stream processing engines such as Storm, S4, and Samza.
> This capability is achieved by designing a minimal API that captures the
> essence of modern DSPEs. This API also allows to easily write new bindings
> to port SAMOA to new execution engines.
>
> As a library: SAMOA contains implementations of state-of-the-art
> algorithms for distributed machine learning on streams. Currently, SAMOA
> implemented vertical Hoeffding tree for classification, distributed k-means
> algorithm for clustering, and adaptive model rules(Have two
> implementations) for regression, as well as programming abstractions to
> develop new algorithms.The library also includes meta-algorithms such as
> bagging and boosting(ensemble techniques) for improve the predictive force.”
>
> I created a siddhi extension using samoa as a machine learning algorithm
> library. It contains classification, regression and clustering extensions
> and SAMOA local mode(not the Distributed version) without a cluster. Also,
> these extensions provide different API calls.
>
> [image: Streaming Machine learning SAMOA integrate to CEP (Abstract).jpg]
>
> Main architecture
>
>
>
> After creating the extensions I tested streaming machine learning accuracy
> using samoa  and batch processing accuracy using weka machine learner.
>
> Classification (Vertical Hoeffding Tree)Using MAGIC Gamma Telescope Data
> Set <https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope>
>
> 18000 data
>
> Batch Process (Using WSO2 ML)
>
> Streaming
>
> Class 1
>
> Class 2
>
> Class 1
>
> Class 2
>
> Accuracy
>
> 82.72
>
> 73.4
>
> F1-Score
>
> 87.09
>
> 73.86
>
> 80.41
>
> 58.53
>
> The accuracy of the batch process is higher than samoa streaming process.
> If that stream has not drifted then the streaming process accuracy
> increases with the time and it will get a stable state.
>
> Regression (AMRules) Using Combined Cycle Power Plant Data Set (CCPP)
> <https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant>
>
> Data Points 9500
>
> Samoa (Adaptive Model Rules Regressor)
>
> Weka
>
> linearRegression
>
> M5Rules
>
> Mean absolute error
>
> 3.68
>
> 3.63
>
> 3.06
>
> Root mean squared error
>
> 6.69
>
> 4.56
>
> 3.99
>
> Relative absolute error
>
> 24.7
>
> 24.43
>
> 20.61
>
> Root relative squared error
>
> 37.8
>
> 26.7
>
> 23.4
>
> I did regression test using 2 datasets and classification test using 2
> data sets. According to those results I saw there is no huge error between
> streaming and batch process. Comparing with classification and clustering,
> streaming regression and batch regression have similar error rates.
> Therefore I think streaming ml  is really suitable for regression.
>
> Clustering (k-means) Using 3D Road Network (North Jutland, Denmark) Data
> Set
> <https://archive.ics.uci.edu/ml/datasets/3D+Road+Network+%28North+Jutland,+Denmark%29>
>
> Data points 434874
>
> Attribute_1
>
> Attribute_2
>
> Attribute_3
>
> Attribute_4
>
> Samoa
>
> Weka
>
> Samoa
>
> Weka
>
> Samoa
>
> Weka
>
> Samoa
>
> Weka
>
> Center_0
>
> 100098819.2
>
> 111598410.7
>
> 9.77
>
> 10.2
>
> 57.16
>
> 57.37
>
> 21.23
>
> 19.4
>
> Center_1
>
> 36598276.23
>
> 35877429.78
>
> 9.72
>
> 9.88
>
> 57.05
>
> 56.87
>
> 21.87
>
> 22.47
>
> Center_2
>
> 138161280.2
>
> 116561030.9
>
> 9.57
>
> 9.35
>
> 57.09
>
> 57.15
>
> 23.15
>
> 23.17
>
> Mean
>
> 97869870.26
>
> 9.7318
>
> 57.0838
>
> 22.1854
>
> 10 Iterations, K-Means algorithm
>
> In streaming clustering the range of the cluster centers is thinner than
> batch process cluster centers range.
>
> References
>
> [1] - Samoa research paper  http://www.jmlr.org/papers/
> volume16/morales15a/morales15a.pdf
>
> [2] - Samoa docs  http://samoa.incubator.apache.org/
>
> [3] - Git repository  https://github.com/Jayancv/streaingML
> <https://github.com/Jayancv/streamingML>
>
> [4] - Statistics of tests https://docs.google.com/a/
> wso2.com/spreadsheets/d/1uROw0gGIu_Ht0J0YnSOHoH600ZnJG9ejp9ztMaXA
> 09s/edit?usp=sharing
>
>
> --
>
> Regards,
>
> Jayan Vidanapathirana
> Intern Software Engineer,
> WSO2.
> mobile +94715594516 <+94%2071%20559%204516>
> <http://www.linkedin.com/in/>www.linkedin.com/in/jayancv
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
Thanks & Regards,

Fazlan Nazeem

*Software Engineer*

*WSO2 Inc*
Mobile : +94772338839
<%2B94%20%280%29%20773%20451194>
[email protected]
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to