Hi, @Fazlan Currently the logger output from the streamingml extension prints the corresponding streaming ML model's accuracy statistics after each display interval elapses.
@Jayan, Can we attach an ID to the accuracy statistics so that we will not get confused if we run multiple streaming ML queries simultaneously? -- Thanks, Miyuru Dayarathna Senior Technical Lead Mobile: +94713527783 Blog: http://miyurublog.blogspot.com On Thu, Dec 1, 2016 at 2:15 PM, Jayan Vidanapathirana <jay...@wso2.com> wrote: > Hi Fazlan, > > I think this API doc will solve your question. > > [1] streamingML apiDoc - https://docs.google.com/a/wso2 > .com/document/d/1bxDLwfNSyxvt1K9tCTE1HcWVVo4mK88- > Ozxav6yVWic/edit?usp=sharing > > Thanks. > > > Regards, > > Jayan Vidanapathirana > Intern Software Engineer, > WSO2. > mobile +94715594516 <+94%2071%20559%204516> > [image: https://lk.linkedin.com/in/jayancv] > <https://lk.linkedin.com/in/jayancv> > > > On Thu, Dec 1, 2016 at 12:42 PM, Fazlan Nazeem <fazl...@wso2.com> wrote: > >> Hi Jayan, >> >> Is there a way to output the accuracy of a specific model within a siddhi >> execution plan? >> >> >> >> On Wed, Nov 30, 2016 at 4:38 PM, Jayan Vidanapathirana <jay...@wso2.com> >> wrote: >> >>> Hi, >>> >>> >>> I am one of the interns working on the "Streaming Machine Learning on >>> WSO2 CEP" Project. I have built a Siddhi extension to CEP using Apache >>> SAMOA machine learning. >>> >>> “SAMOA (Scalable Advanced Massive Online Analysis) is a platform for >>> mining big data streams. Currently, this is an apache incubator >>> project.Samoa is written in Java and it is open source, and available at >>> http://samoa-project.net under the Apache Software License version 2.0. >>> >>> As a framework : it allows algorithm developers to abstract from the >>> underlying execution engine, and therefore reuse their code on different >>> engines. It features a pluggable architecture that allows it to run on >>> several distributed stream processing engines such as Storm, S4, and Samza. >>> This capability is achieved by designing a minimal API that captures the >>> essence of modern DSPEs. This API also allows to easily write new bindings >>> to port SAMOA to new execution engines. >>> >>> As a library: SAMOA contains implementations of state-of-the-art >>> algorithms for distributed machine learning on streams. Currently, SAMOA >>> implemented vertical Hoeffding tree for classification, distributed k-means >>> algorithm for clustering, and adaptive model rules(Have two >>> implementations) for regression, as well as programming abstractions to >>> develop new algorithms.The library also includes meta-algorithms such as >>> bagging and boosting(ensemble techniques) for improve the predictive force.” >>> >>> I created a siddhi extension using samoa as a machine learning algorithm >>> library. It contains classification, regression and clustering extensions >>> and SAMOA local mode(not the Distributed version) without a cluster. Also, >>> these extensions provide different API calls. >>> >>> [image: Streaming Machine learning SAMOA integrate to CEP (Abstract).jpg] >>> >>> Main architecture >>> >>> >>> >>> After creating the extensions I tested streaming machine learning >>> accuracy using samoa and batch processing accuracy using weka machine >>> learner. >>> >>> Classification (Vertical Hoeffding Tree)Using MAGIC Gamma Telescope >>> Data Set <https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope> >>> >>> 18000 data >>> >>> Batch Process (Using WSO2 ML) >>> >>> Streaming >>> >>> Class 1 >>> >>> Class 2 >>> >>> Class 1 >>> >>> Class 2 >>> >>> Accuracy >>> >>> 82.72 >>> >>> 73.4 >>> >>> F1-Score >>> >>> 87.09 >>> >>> 73.86 >>> >>> 80.41 >>> >>> 58.53 >>> >>> The accuracy of the batch process is higher than samoa streaming >>> process. If that stream has not drifted then the streaming process accuracy >>> increases with the time and it will get a stable state. >>> >>> Regression (AMRules) Using Combined Cycle Power Plant Data Set (CCPP) >>> <https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant> >>> >>> Data Points 9500 >>> >>> Samoa (Adaptive Model Rules Regressor) >>> >>> Weka >>> >>> linearRegression >>> >>> M5Rules >>> >>> Mean absolute error >>> >>> 3.68 >>> >>> 3.63 >>> >>> 3.06 >>> >>> Root mean squared error >>> >>> 6.69 >>> >>> 4.56 >>> >>> 3.99 >>> >>> Relative absolute error >>> >>> 24.7 >>> >>> 24.43 >>> >>> 20.61 >>> >>> Root relative squared error >>> >>> 37.8 >>> >>> 26.7 >>> >>> 23.4 >>> >>> I did regression test using 2 datasets and classification test using 2 >>> data sets. According to those results I saw there is no huge error between >>> streaming and batch process. Comparing with classification and clustering, >>> streaming regression and batch regression have similar error rates. >>> Therefore I think streaming ml is really suitable for regression. >>> >>> Clustering (k-means) Using 3D Road Network (North Jutland, Denmark) >>> Data Set >>> <https://archive.ics.uci.edu/ml/datasets/3D+Road+Network+%28North+Jutland,+Denmark%29> >>> >>> Data points 434874 >>> >>> Attribute_1 >>> >>> Attribute_2 >>> >>> Attribute_3 >>> >>> Attribute_4 >>> >>> Samoa >>> >>> Weka >>> >>> Samoa >>> >>> Weka >>> >>> Samoa >>> >>> Weka >>> >>> Samoa >>> >>> Weka >>> >>> Center_0 >>> >>> 100098819.2 >>> >>> 111598410.7 >>> >>> 9.77 >>> >>> 10.2 >>> >>> 57.16 >>> >>> 57.37 >>> >>> 21.23 >>> >>> 19.4 >>> >>> Center_1 >>> >>> 36598276.23 >>> >>> 35877429.78 >>> >>> 9.72 >>> >>> 9.88 >>> >>> 57.05 >>> >>> 56.87 >>> >>> 21.87 >>> >>> 22.47 >>> >>> Center_2 >>> >>> 138161280.2 >>> >>> 116561030.9 >>> >>> 9.57 >>> >>> 9.35 >>> >>> 57.09 >>> >>> 57.15 >>> >>> 23.15 >>> >>> 23.17 >>> >>> Mean >>> >>> 97869870.26 >>> >>> 9.7318 >>> >>> 57.0838 >>> >>> 22.1854 >>> >>> 10 Iterations, K-Means algorithm >>> >>> In streaming clustering the range of the cluster centers is thinner than >>> batch process cluster centers range. >>> >>> References >>> >>> [1] - Samoa research paper http://www.jmlr.org/papers/vo >>> lume16/morales15a/morales15a.pdf >>> >>> [2] - Samoa docs http://samoa.incubator.apache.org/ >>> >>> [3] - Git repository https://github.com/Jayancv/streaingML >>> <https://github.com/Jayancv/streamingML> >>> >>> [4] - Statistics of tests https://docs.google.com/a/wso2 >>> .com/spreadsheets/d/1uROw0gGIu_Ht0J0YnSOHoH600ZnJG9ejp9ztMaX >>> A09s/edit?usp=sharing >>> >>> >>> -- >>> >>> Regards, >>> >>> Jayan Vidanapathirana >>> Intern Software Engineer, >>> WSO2. >>> mobile +94715594516 <+94%2071%20559%204516> >>> <http://www.linkedin.com/in/>www.linkedin.com/in/jayancv >>> >>> _______________________________________________ >>> Architecture mailing list >>> Architecture@wso2.org >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> Thanks & Regards, >> >> Fazlan Nazeem >> >> *Software Engineer* >> >> *WSO2 Inc* >> Mobile : +94772338839 >> <%2B94%20%280%29%20773%20451194> >> fazl...@wso2.com >> >> _______________________________________________ >> Architecture mailing list >> Architecture@wso2.org >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > _______________________________________________ > Architecture mailing list > Architecture@wso2.org > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- Thanks, Miyuru Dayarathna Senior Technical Lead Mobile: +94713527783 <+94%2071%20352%207783> Blog: http://miyurublog.blogspot.com
_______________________________________________ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture