Hi Fazlan,

I think this API doc will solve your question.

[1] streamingML apiDoc -
https://docs.google.com/a/wso2.com/document/d/1bxDLwfNSyxvt1K9tCTE1HcWVVo4mK88-Ozxav6yVWic/edit?usp=sharing

Thanks.


Regards,

Jayan Vidanapathirana
Intern Software Engineer,
WSO2.
mobile +94715594516
[image: https://lk.linkedin.com/in/jayancv]
<https://lk.linkedin.com/in/jayancv>


On Thu, Dec 1, 2016 at 12:42 PM, Fazlan Nazeem <fazl...@wso2.com> wrote:

> Hi Jayan,
>
> Is there a way to output the accuracy of a specific model within a siddhi
> execution plan?
>
>
>
> On Wed, Nov 30, 2016 at 4:38 PM, Jayan Vidanapathirana <jay...@wso2.com>
> wrote:
>
>> Hi,
>>
>>
>> I am one of the interns working on the "Streaming Machine Learning on
>> WSO2 CEP" Project. I have built a Siddhi extension to CEP using Apache
>> SAMOA machine learning.
>>
>> “SAMOA (Scalable Advanced Massive Online Analysis) is a platform for
>> mining big data streams. Currently, this is an apache incubator
>> project.Samoa is written in Java  and it is open source, and available at
>> http://samoa-project.net under the Apache Software License version 2.0.
>>
>> As a framework : it allows algorithm developers to abstract from the
>> underlying execution engine, and therefore reuse their code on different
>> engines. It features a pluggable architecture that allows it to run on
>> several distributed stream processing engines such as Storm, S4, and Samza.
>> This capability is achieved by designing a minimal API that captures the
>> essence of modern DSPEs. This API also allows to easily write new bindings
>> to port SAMOA to new execution engines.
>>
>> As a library: SAMOA contains implementations of state-of-the-art
>> algorithms for distributed machine learning on streams. Currently, SAMOA
>> implemented vertical Hoeffding tree for classification, distributed k-means
>> algorithm for clustering, and adaptive model rules(Have two
>> implementations) for regression, as well as programming abstractions to
>> develop new algorithms.The library also includes meta-algorithms such as
>> bagging and boosting(ensemble techniques) for improve the predictive force.”
>>
>> I created a siddhi extension using samoa as a machine learning algorithm
>> library. It contains classification, regression and clustering extensions
>> and SAMOA local mode(not the Distributed version) without a cluster. Also,
>> these extensions provide different API calls.
>>
>> [image: Streaming Machine learning SAMOA integrate to CEP (Abstract).jpg]
>>
>> Main architecture
>>
>>
>>
>> After creating the extensions I tested streaming machine learning
>> accuracy using samoa  and batch processing accuracy using weka machine
>> learner.
>>
>> Classification (Vertical Hoeffding Tree)Using MAGIC Gamma Telescope Data
>> Set <https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope>
>>
>> 18000 data
>>
>> Batch Process (Using WSO2 ML)
>>
>> Streaming
>>
>> Class 1
>>
>> Class 2
>>
>> Class 1
>>
>> Class 2
>>
>> Accuracy
>>
>> 82.72
>>
>> 73.4
>>
>> F1-Score
>>
>> 87.09
>>
>> 73.86
>>
>> 80.41
>>
>> 58.53
>>
>> The accuracy of the batch process is higher than samoa streaming process.
>> If that stream has not drifted then the streaming process accuracy
>> increases with the time and it will get a stable state.
>>
>> Regression (AMRules) Using Combined Cycle Power Plant Data Set (CCPP)
>> <https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant>
>>
>> Data Points 9500
>>
>> Samoa (Adaptive Model Rules Regressor)
>>
>> Weka
>>
>> linearRegression
>>
>> M5Rules
>>
>> Mean absolute error
>>
>> 3.68
>>
>> 3.63
>>
>> 3.06
>>
>> Root mean squared error
>>
>> 6.69
>>
>> 4.56
>>
>> 3.99
>>
>> Relative absolute error
>>
>> 24.7
>>
>> 24.43
>>
>> 20.61
>>
>> Root relative squared error
>>
>> 37.8
>>
>> 26.7
>>
>> 23.4
>>
>> I did regression test using 2 datasets and classification test using 2
>> data sets. According to those results I saw there is no huge error between
>> streaming and batch process. Comparing with classification and clustering,
>> streaming regression and batch regression have similar error rates.
>> Therefore I think streaming ml  is really suitable for regression.
>>
>> Clustering (k-means) Using 3D Road Network (North Jutland, Denmark) Data
>> Set
>> <https://archive.ics.uci.edu/ml/datasets/3D+Road+Network+%28North+Jutland,+Denmark%29>
>>
>> Data points 434874
>>
>> Attribute_1
>>
>> Attribute_2
>>
>> Attribute_3
>>
>> Attribute_4
>>
>> Samoa
>>
>> Weka
>>
>> Samoa
>>
>> Weka
>>
>> Samoa
>>
>> Weka
>>
>> Samoa
>>
>> Weka
>>
>> Center_0
>>
>> 100098819.2
>>
>> 111598410.7
>>
>> 9.77
>>
>> 10.2
>>
>> 57.16
>>
>> 57.37
>>
>> 21.23
>>
>> 19.4
>>
>> Center_1
>>
>> 36598276.23
>>
>> 35877429.78
>>
>> 9.72
>>
>> 9.88
>>
>> 57.05
>>
>> 56.87
>>
>> 21.87
>>
>> 22.47
>>
>> Center_2
>>
>> 138161280.2
>>
>> 116561030.9
>>
>> 9.57
>>
>> 9.35
>>
>> 57.09
>>
>> 57.15
>>
>> 23.15
>>
>> 23.17
>>
>> Mean
>>
>> 97869870.26
>>
>> 9.7318
>>
>> 57.0838
>>
>> 22.1854
>>
>> 10 Iterations, K-Means algorithm
>>
>> In streaming clustering the range of the cluster centers is thinner than
>> batch process cluster centers range.
>>
>> References
>>
>> [1] - Samoa research paper  http://www.jmlr.org/papers/vo
>> lume16/morales15a/morales15a.pdf
>>
>> [2] - Samoa docs  http://samoa.incubator.apache.org/
>>
>> [3] - Git repository  https://github.com/Jayancv/streaingML
>> <https://github.com/Jayancv/streamingML>
>>
>> [4] - Statistics of tests https://docs.google.com/a/wso2
>> .com/spreadsheets/d/1uROw0gGIu_Ht0J0YnSOHoH600ZnJG9ejp9ztMaX
>> A09s/edit?usp=sharing
>>
>>
>> --
>>
>> Regards,
>>
>> Jayan Vidanapathirana
>> Intern Software Engineer,
>> WSO2.
>> mobile +94715594516 <+94%2071%20559%204516>
>> <http://www.linkedin.com/in/>www.linkedin.com/in/jayancv
>>
>> _______________________________________________
>> Architecture mailing list
>> Architecture@wso2.org
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> Thanks & Regards,
>
> Fazlan Nazeem
>
> *Software Engineer*
>
> *WSO2 Inc*
> Mobile : +94772338839
> <%2B94%20%280%29%20773%20451194>
> fazl...@wso2.com
>
> _______________________________________________
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>
_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to