Hi Fazlan, I think this API doc will solve your question.
[1] streamingML apiDoc - https://docs.google.com/a/wso2.com/document/d/1bxDLwfNSyxvt1K9tCTE1HcWVVo4mK88-Ozxav6yVWic/edit?usp=sharing Thanks. Regards, Jayan Vidanapathirana Intern Software Engineer, WSO2. mobile +94715594516 [image: https://lk.linkedin.com/in/jayancv] <https://lk.linkedin.com/in/jayancv> On Thu, Dec 1, 2016 at 12:42 PM, Fazlan Nazeem <[email protected]> wrote: > Hi Jayan, > > Is there a way to output the accuracy of a specific model within a siddhi > execution plan? > > > > On Wed, Nov 30, 2016 at 4:38 PM, Jayan Vidanapathirana <[email protected]> > wrote: > >> Hi, >> >> >> I am one of the interns working on the "Streaming Machine Learning on >> WSO2 CEP" Project. I have built a Siddhi extension to CEP using Apache >> SAMOA machine learning. >> >> “SAMOA (Scalable Advanced Massive Online Analysis) is a platform for >> mining big data streams. Currently, this is an apache incubator >> project.Samoa is written in Java and it is open source, and available at >> http://samoa-project.net under the Apache Software License version 2.0. >> >> As a framework : it allows algorithm developers to abstract from the >> underlying execution engine, and therefore reuse their code on different >> engines. It features a pluggable architecture that allows it to run on >> several distributed stream processing engines such as Storm, S4, and Samza. >> This capability is achieved by designing a minimal API that captures the >> essence of modern DSPEs. This API also allows to easily write new bindings >> to port SAMOA to new execution engines. >> >> As a library: SAMOA contains implementations of state-of-the-art >> algorithms for distributed machine learning on streams. Currently, SAMOA >> implemented vertical Hoeffding tree for classification, distributed k-means >> algorithm for clustering, and adaptive model rules(Have two >> implementations) for regression, as well as programming abstractions to >> develop new algorithms.The library also includes meta-algorithms such as >> bagging and boosting(ensemble techniques) for improve the predictive force.” >> >> I created a siddhi extension using samoa as a machine learning algorithm >> library. It contains classification, regression and clustering extensions >> and SAMOA local mode(not the Distributed version) without a cluster. Also, >> these extensions provide different API calls. >> >> [image: Streaming Machine learning SAMOA integrate to CEP (Abstract).jpg] >> >> Main architecture >> >> >> >> After creating the extensions I tested streaming machine learning >> accuracy using samoa and batch processing accuracy using weka machine >> learner. >> >> Classification (Vertical Hoeffding Tree)Using MAGIC Gamma Telescope Data >> Set <https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope> >> >> 18000 data >> >> Batch Process (Using WSO2 ML) >> >> Streaming >> >> Class 1 >> >> Class 2 >> >> Class 1 >> >> Class 2 >> >> Accuracy >> >> 82.72 >> >> 73.4 >> >> F1-Score >> >> 87.09 >> >> 73.86 >> >> 80.41 >> >> 58.53 >> >> The accuracy of the batch process is higher than samoa streaming process. >> If that stream has not drifted then the streaming process accuracy >> increases with the time and it will get a stable state. >> >> Regression (AMRules) Using Combined Cycle Power Plant Data Set (CCPP) >> <https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant> >> >> Data Points 9500 >> >> Samoa (Adaptive Model Rules Regressor) >> >> Weka >> >> linearRegression >> >> M5Rules >> >> Mean absolute error >> >> 3.68 >> >> 3.63 >> >> 3.06 >> >> Root mean squared error >> >> 6.69 >> >> 4.56 >> >> 3.99 >> >> Relative absolute error >> >> 24.7 >> >> 24.43 >> >> 20.61 >> >> Root relative squared error >> >> 37.8 >> >> 26.7 >> >> 23.4 >> >> I did regression test using 2 datasets and classification test using 2 >> data sets. According to those results I saw there is no huge error between >> streaming and batch process. Comparing with classification and clustering, >> streaming regression and batch regression have similar error rates. >> Therefore I think streaming ml is really suitable for regression. >> >> Clustering (k-means) Using 3D Road Network (North Jutland, Denmark) Data >> Set >> <https://archive.ics.uci.edu/ml/datasets/3D+Road+Network+%28North+Jutland,+Denmark%29> >> >> Data points 434874 >> >> Attribute_1 >> >> Attribute_2 >> >> Attribute_3 >> >> Attribute_4 >> >> Samoa >> >> Weka >> >> Samoa >> >> Weka >> >> Samoa >> >> Weka >> >> Samoa >> >> Weka >> >> Center_0 >> >> 100098819.2 >> >> 111598410.7 >> >> 9.77 >> >> 10.2 >> >> 57.16 >> >> 57.37 >> >> 21.23 >> >> 19.4 >> >> Center_1 >> >> 36598276.23 >> >> 35877429.78 >> >> 9.72 >> >> 9.88 >> >> 57.05 >> >> 56.87 >> >> 21.87 >> >> 22.47 >> >> Center_2 >> >> 138161280.2 >> >> 116561030.9 >> >> 9.57 >> >> 9.35 >> >> 57.09 >> >> 57.15 >> >> 23.15 >> >> 23.17 >> >> Mean >> >> 97869870.26 >> >> 9.7318 >> >> 57.0838 >> >> 22.1854 >> >> 10 Iterations, K-Means algorithm >> >> In streaming clustering the range of the cluster centers is thinner than >> batch process cluster centers range. >> >> References >> >> [1] - Samoa research paper http://www.jmlr.org/papers/vo >> lume16/morales15a/morales15a.pdf >> >> [2] - Samoa docs http://samoa.incubator.apache.org/ >> >> [3] - Git repository https://github.com/Jayancv/streaingML >> <https://github.com/Jayancv/streamingML> >> >> [4] - Statistics of tests https://docs.google.com/a/wso2 >> .com/spreadsheets/d/1uROw0gGIu_Ht0J0YnSOHoH600ZnJG9ejp9ztMaX >> A09s/edit?usp=sharing >> >> >> -- >> >> Regards, >> >> Jayan Vidanapathirana >> Intern Software Engineer, >> WSO2. >> mobile +94715594516 <+94%2071%20559%204516> >> <http://www.linkedin.com/in/>www.linkedin.com/in/jayancv >> >> _______________________________________________ >> Architecture mailing list >> [email protected] >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > Thanks & Regards, > > Fazlan Nazeem > > *Software Engineer* > > *WSO2 Inc* > Mobile : +94772338839 > <%2B94%20%280%29%20773%20451194> > [email protected] > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > >
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
