Re: [Architecture] Streaming Machine learning on CEP

Miyuru Dayarathna Fri, 02 Dec 2016 01:45:08 -0800

Hi,

@Fazlan
Currently the logger output from the streamingml extension prints the
corresponding streaming ML model's accuracy statistics after each display
interval elapses.


@Jayan,
Can we attach an ID to the accuracy statistics so that we will not get
confused if we run multiple streaming ML queries simultaneously?

-- 
Thanks,
Miyuru Dayarathna
Senior Technical Lead
Mobile: +94713527783
Blog: http://miyurublog.blogspot.com

On Thu, Dec 1, 2016 at 2:15 PM, Jayan Vidanapathirana <jay...@wso2.com>
wrote:

> Hi Fazlan,
>
> I think this API doc will solve your question.
>
> [1] streamingML apiDoc - https://docs.google.com/a/wso2
> .com/document/d/1bxDLwfNSyxvt1K9tCTE1HcWVVo4mK88-
> Ozxav6yVWic/edit?usp=sharing
>
> Thanks.
>
>
> Regards,
>
> Jayan Vidanapathirana
> Intern Software Engineer,
> WSO2.
> mobile +94715594516 <+94%2071%20559%204516>
> [image: https://lk.linkedin.com/in/jayancv]
> <https://lk.linkedin.com/in/jayancv>
>
>
> On Thu, Dec 1, 2016 at 12:42 PM, Fazlan Nazeem <fazl...@wso2.com> wrote:
>
>> Hi Jayan,
>>
>> Is there a way to output the accuracy of a specific model within a siddhi
>> execution plan?
>>
>>
>>
>> On Wed, Nov 30, 2016 at 4:38 PM, Jayan Vidanapathirana <jay...@wso2.com>
>> wrote:
>>
>>> Hi,
>>>
>>>
>>> I am one of the interns working on the "Streaming Machine Learning on
>>> WSO2 CEP" Project. I have built a Siddhi extension to CEP using Apache
>>> SAMOA machine learning.
>>>
>>> “SAMOA (Scalable Advanced Massive Online Analysis) is a platform for
>>> mining big data streams. Currently, this is an apache incubator
>>> project.Samoa is written in Java  and it is open source, and available at
>>> http://samoa-project.net under the Apache Software License version 2.0.
>>>
>>> As a framework : it allows algorithm developers to abstract from the
>>> underlying execution engine, and therefore reuse their code on different
>>> engines. It features a pluggable architecture that allows it to run on
>>> several distributed stream processing engines such as Storm, S4, and Samza.
>>> This capability is achieved by designing a minimal API that captures the
>>> essence of modern DSPEs. This API also allows to easily write new bindings
>>> to port SAMOA to new execution engines.
>>>
>>> As a library: SAMOA contains implementations of state-of-the-art
>>> algorithms for distributed machine learning on streams. Currently, SAMOA
>>> implemented vertical Hoeffding tree for classification, distributed k-means
>>> algorithm for clustering, and adaptive model rules(Have two
>>> implementations) for regression, as well as programming abstractions to
>>> develop new algorithms.The library also includes meta-algorithms such as
>>> bagging and boosting(ensemble techniques) for improve the predictive force.”
>>>
>>> I created a siddhi extension using samoa as a machine learning algorithm
>>> library. It contains classification, regression and clustering extensions
>>> and SAMOA local mode(not the Distributed version) without a cluster. Also,
>>> these extensions provide different API calls.
>>>
>>> [image: Streaming Machine learning SAMOA integrate to CEP (Abstract).jpg]
>>>
>>> Main architecture
>>>
>>>
>>>
>>> After creating the extensions I tested streaming machine learning
>>> accuracy using samoa  and batch processing accuracy using weka machine
>>> learner.
>>>
>>> Classification (Vertical Hoeffding Tree)Using MAGIC Gamma Telescope
>>> Data Set <https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope>
>>>
>>> 18000 data
>>>
>>> Batch Process (Using WSO2 ML)
>>>
>>> Streaming
>>>
>>> Class 1
>>>
>>> Class 2
>>>
>>> Class 1
>>>
>>> Class 2
>>>
>>> Accuracy
>>>
>>> 82.72
>>>
>>> 73.4
>>>
>>> F1-Score
>>>
>>> 87.09
>>>
>>> 73.86
>>>
>>> 80.41
>>>
>>> 58.53
>>>
>>> The accuracy of the batch process is higher than samoa streaming
>>> process. If that stream has not drifted then the streaming process accuracy
>>> increases with the time and it will get a stable state.
>>>
>>> Regression (AMRules) Using Combined Cycle Power Plant Data Set (CCPP)
>>> <https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant>
>>>
>>> Data Points 9500
>>>
>>> Samoa (Adaptive Model Rules Regressor)
>>>
>>> Weka
>>>
>>> linearRegression
>>>
>>> M5Rules
>>>
>>> Mean absolute error
>>>
>>> 3.68
>>>
>>> 3.63
>>>
>>> 3.06
>>>
>>> Root mean squared error
>>>
>>> 6.69
>>>
>>> 4.56
>>>
>>> 3.99
>>>
>>> Relative absolute error
>>>
>>> 24.7
>>>
>>> 24.43
>>>
>>> 20.61
>>>
>>> Root relative squared error
>>>
>>> 37.8
>>>
>>> 26.7
>>>
>>> 23.4
>>>
>>> I did regression test using 2 datasets and classification test using 2
>>> data sets. According to those results I saw there is no huge error between
>>> streaming and batch process. Comparing with classification and clustering,
>>> streaming regression and batch regression have similar error rates.
>>> Therefore I think streaming ml  is really suitable for regression.
>>>
>>> Clustering (k-means) Using 3D Road Network (North Jutland, Denmark)
>>> Data Set
>>> <https://archive.ics.uci.edu/ml/datasets/3D+Road+Network+%28North+Jutland,+Denmark%29>
>>>
>>> Data points 434874
>>>
>>> Attribute_1
>>>
>>> Attribute_2
>>>
>>> Attribute_3
>>>
>>> Attribute_4
>>>
>>> Samoa
>>>
>>> Weka
>>>
>>> Samoa
>>>
>>> Weka
>>>
>>> Samoa
>>>
>>> Weka
>>>
>>> Samoa
>>>
>>> Weka
>>>
>>> Center_0
>>>
>>> 100098819.2
>>>
>>> 111598410.7
>>>
>>> 9.77
>>>
>>> 10.2
>>>
>>> 57.16
>>>
>>> 57.37
>>>
>>> 21.23
>>>
>>> 19.4
>>>
>>> Center_1
>>>
>>> 36598276.23
>>>
>>> 35877429.78
>>>
>>> 9.72
>>>
>>> 9.88
>>>
>>> 57.05
>>>
>>> 56.87
>>>
>>> 21.87
>>>
>>> 22.47
>>>
>>> Center_2
>>>
>>> 138161280.2
>>>
>>> 116561030.9
>>>
>>> 9.57
>>>
>>> 9.35
>>>
>>> 57.09
>>>
>>> 57.15
>>>
>>> 23.15
>>>
>>> 23.17
>>>
>>> Mean
>>>
>>> 97869870.26
>>>
>>> 9.7318
>>>
>>> 57.0838
>>>
>>> 22.1854
>>>
>>> 10 Iterations, K-Means algorithm
>>>
>>> In streaming clustering the range of the cluster centers is thinner than
>>> batch process cluster centers range.
>>>
>>> References
>>>
>>> [1] - Samoa research paper  http://www.jmlr.org/papers/vo
>>> lume16/morales15a/morales15a.pdf
>>>
>>> [2] - Samoa docs  http://samoa.incubator.apache.org/
>>>
>>> [3] - Git repository  https://github.com/Jayancv/streaingML
>>> <https://github.com/Jayancv/streamingML>
>>>
>>> [4] - Statistics of tests https://docs.google.com/a/wso2
>>> .com/spreadsheets/d/1uROw0gGIu_Ht0J0YnSOHoH600ZnJG9ejp9ztMaX
>>> A09s/edit?usp=sharing
>>>
>>>
>>> --
>>>
>>> Regards,
>>>
>>> Jayan Vidanapathirana
>>> Intern Software Engineer,
>>> WSO2.
>>> mobile +94715594516 <+94%2071%20559%204516>
>>> <http://www.linkedin.com/in/>www.linkedin.com/in/jayancv
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> Architecture@wso2.org
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>>
>> Fazlan Nazeem
>>
>> *Software Engineer*
>>
>> *WSO2 Inc*
>> Mobile : +94772338839
>> <%2B94%20%280%29%20773%20451194>
>> fazl...@wso2.com
>>
>> _______________________________________________
>> Architecture mailing list
>> Architecture@wso2.org
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
> _______________________________________________
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
Thanks,
Miyuru Dayarathna
Senior Technical Lead
Mobile: +94713527783 <+94%2071%20352%207783>
Blog: http://miyurublog.blogspot.com

_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] Streaming Machine learning on CEP

Reply via email to