Hi Fazlan, Model is retrained with each and every data point(Training data points) continuously and we are allowed to use only the latest model. I checked is there any way to persist the trained models but currently Samoa haven't that feature.
Regards, Jayan Vidanapathirana Intern Software Engineer, WSO2. mobile +94715594516 [image: https://lk.linkedin.com/in/jayancv] <https://lk.linkedin.com/in/jayancv> On Mon, Dec 5, 2016 at 10:54 AM, Fazlan Nazeem <[email protected]> wrote: > Thanks Miyuru. > > So do we preserve the old models in some way? or the old models get > updated continuously and at a specific point in time we are allowed to use > only the latest available model? > > On Fri, Dec 2, 2016 at 3:13 PM, Miyuru Dayarathna <[email protected]> > wrote: > >> Hi, >> >> @Fazlan >> Currently the logger output from the streamingml extension prints the >> corresponding streaming ML model's accuracy statistics after each display >> interval elapses. >> >> @Jayan, >> Can we attach an ID to the accuracy statistics so that we will not get >> confused if we run multiple streaming ML queries simultaneously? >> >> -- >> Thanks, >> Miyuru Dayarathna >> Senior Technical Lead >> Mobile: +94713527783 <+94%2071%20352%207783> >> Blog: http://miyurublog.blogspot.com >> >> On Thu, Dec 1, 2016 at 2:15 PM, Jayan Vidanapathirana <[email protected]> >> wrote: >> >>> Hi Fazlan, >>> >>> I think this API doc will solve your question. >>> >>> [1] streamingML apiDoc - https://docs.google.com/a/wso2 >>> .com/document/d/1bxDLwfNSyxvt1K9tCTE1HcWVVo4mK88-Ozxav6yVWic >>> /edit?usp=sharing >>> >>> Thanks. >>> >>> >>> Regards, >>> >>> Jayan Vidanapathirana >>> Intern Software Engineer, >>> WSO2. >>> mobile +94715594516 <+94%2071%20559%204516> >>> [image: https://lk.linkedin.com/in/jayancv] >>> <https://lk.linkedin.com/in/jayancv> >>> >>> >>> On Thu, Dec 1, 2016 at 12:42 PM, Fazlan Nazeem <[email protected]> wrote: >>> >>>> Hi Jayan, >>>> >>>> Is there a way to output the accuracy of a specific model within a >>>> siddhi execution plan? >>>> >>>> >>>> >>>> On Wed, Nov 30, 2016 at 4:38 PM, Jayan Vidanapathirana <[email protected] >>>> > wrote: >>>> >>>>> Hi, >>>>> >>>>> >>>>> I am one of the interns working on the "Streaming Machine Learning on >>>>> WSO2 CEP" Project. I have built a Siddhi extension to CEP using Apache >>>>> SAMOA machine learning. >>>>> >>>>> “SAMOA (Scalable Advanced Massive Online Analysis) is a platform for >>>>> mining big data streams. Currently, this is an apache incubator >>>>> project.Samoa is written in Java and it is open source, and available at >>>>> http://samoa-project.net under the Apache Software License version >>>>> 2.0. >>>>> >>>>> As a framework : it allows algorithm developers to abstract from the >>>>> underlying execution engine, and therefore reuse their code on different >>>>> engines. It features a pluggable architecture that allows it to run on >>>>> several distributed stream processing engines such as Storm, S4, and >>>>> Samza. >>>>> This capability is achieved by designing a minimal API that captures the >>>>> essence of modern DSPEs. This API also allows to easily write new bindings >>>>> to port SAMOA to new execution engines. >>>>> >>>>> As a library: SAMOA contains implementations of state-of-the-art >>>>> algorithms for distributed machine learning on streams. Currently, SAMOA >>>>> implemented vertical Hoeffding tree for classification, distributed >>>>> k-means >>>>> algorithm for clustering, and adaptive model rules(Have two >>>>> implementations) for regression, as well as programming abstractions to >>>>> develop new algorithms.The library also includes meta-algorithms such as >>>>> bagging and boosting(ensemble techniques) for improve the predictive >>>>> force.” >>>>> >>>>> I created a siddhi extension using samoa as a machine learning >>>>> algorithm library. It contains classification, regression and clustering >>>>> extensions and SAMOA local mode(not the Distributed version) without a >>>>> cluster. Also, these extensions provide different API calls. >>>>> >>>>> [image: Streaming Machine learning SAMOA integrate to CEP >>>>> (Abstract).jpg] >>>>> >>>>> Main architecture >>>>> >>>>> >>>>> >>>>> After creating the extensions I tested streaming machine learning >>>>> accuracy using samoa and batch processing accuracy using weka machine >>>>> learner. >>>>> >>>>> Classification (Vertical Hoeffding Tree)Using MAGIC Gamma Telescope >>>>> Data Set >>>>> <https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope> >>>>> >>>>> 18000 data >>>>> >>>>> Batch Process (Using WSO2 ML) >>>>> >>>>> Streaming >>>>> >>>>> Class 1 >>>>> >>>>> Class 2 >>>>> >>>>> Class 1 >>>>> >>>>> Class 2 >>>>> >>>>> Accuracy >>>>> >>>>> 82.72 >>>>> >>>>> 73.4 >>>>> >>>>> F1-Score >>>>> >>>>> 87.09 >>>>> >>>>> 73.86 >>>>> >>>>> 80.41 >>>>> >>>>> 58.53 >>>>> >>>>> The accuracy of the batch process is higher than samoa streaming >>>>> process. If that stream has not drifted then the streaming process >>>>> accuracy >>>>> increases with the time and it will get a stable state. >>>>> >>>>> Regression (AMRules) Using Combined Cycle Power Plant Data Set (CCPP) >>>>> <https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant> >>>>> >>>>> Data Points 9500 >>>>> >>>>> Samoa (Adaptive Model Rules Regressor) >>>>> >>>>> Weka >>>>> >>>>> linearRegression >>>>> >>>>> M5Rules >>>>> >>>>> Mean absolute error >>>>> >>>>> 3.68 >>>>> >>>>> 3.63 >>>>> >>>>> 3.06 >>>>> >>>>> Root mean squared error >>>>> >>>>> 6.69 >>>>> >>>>> 4.56 >>>>> >>>>> 3.99 >>>>> >>>>> Relative absolute error >>>>> >>>>> 24.7 >>>>> >>>>> 24.43 >>>>> >>>>> 20.61 >>>>> >>>>> Root relative squared error >>>>> >>>>> 37.8 >>>>> >>>>> 26.7 >>>>> >>>>> 23.4 >>>>> >>>>> I did regression test using 2 datasets and classification test using 2 >>>>> data sets. According to those results I saw there is no huge error between >>>>> streaming and batch process. Comparing with classification and clustering, >>>>> streaming regression and batch regression have similar error rates. >>>>> Therefore I think streaming ml is really suitable for regression. >>>>> >>>>> Clustering (k-means) Using 3D Road Network (North Jutland, Denmark) >>>>> Data Set >>>>> <https://archive.ics.uci.edu/ml/datasets/3D+Road+Network+%28North+Jutland,+Denmark%29> >>>>> >>>>> Data points 434874 >>>>> >>>>> Attribute_1 >>>>> >>>>> Attribute_2 >>>>> >>>>> Attribute_3 >>>>> >>>>> Attribute_4 >>>>> >>>>> Samoa >>>>> >>>>> Weka >>>>> >>>>> Samoa >>>>> >>>>> Weka >>>>> >>>>> Samoa >>>>> >>>>> Weka >>>>> >>>>> Samoa >>>>> >>>>> Weka >>>>> >>>>> Center_0 >>>>> >>>>> 100098819.2 >>>>> >>>>> 111598410.7 >>>>> >>>>> 9.77 >>>>> >>>>> 10.2 >>>>> >>>>> 57.16 >>>>> >>>>> 57.37 >>>>> >>>>> 21.23 >>>>> >>>>> 19.4 >>>>> >>>>> Center_1 >>>>> >>>>> 36598276.23 >>>>> >>>>> 35877429.78 >>>>> >>>>> 9.72 >>>>> >>>>> 9.88 >>>>> >>>>> 57.05 >>>>> >>>>> 56.87 >>>>> >>>>> 21.87 >>>>> >>>>> 22.47 >>>>> >>>>> Center_2 >>>>> >>>>> 138161280.2 >>>>> >>>>> 116561030.9 >>>>> >>>>> 9.57 >>>>> >>>>> 9.35 >>>>> >>>>> 57.09 >>>>> >>>>> 57.15 >>>>> >>>>> 23.15 >>>>> >>>>> 23.17 >>>>> >>>>> Mean >>>>> >>>>> 97869870.26 >>>>> >>>>> 9.7318 >>>>> >>>>> 57.0838 >>>>> >>>>> 22.1854 >>>>> >>>>> 10 Iterations, K-Means algorithm >>>>> >>>>> In streaming clustering the range of the cluster centers is thinner >>>>> than batch process cluster centers range. >>>>> >>>>> References >>>>> >>>>> [1] - Samoa research paper http://www.jmlr.org/papers/vo >>>>> lume16/morales15a/morales15a.pdf >>>>> >>>>> [2] - Samoa docs http://samoa.incubator.apache.org/ >>>>> >>>>> [3] - Git repository https://github.com/Jayancv/streaingML >>>>> <https://github.com/Jayancv/streamingML> >>>>> >>>>> [4] - Statistics of tests https://docs.google.com/a/wso2 >>>>> .com/spreadsheets/d/1uROw0gGIu_Ht0J0YnSOHoH600ZnJG9ejp9ztMaX >>>>> A09s/edit?usp=sharing >>>>> >>>>> >>>>> -- >>>>> >>>>> Regards, >>>>> >>>>> Jayan Vidanapathirana >>>>> Intern Software Engineer, >>>>> WSO2. >>>>> mobile +94715594516 <+94%2071%20559%204516> >>>>> <http://www.linkedin.com/in/>www.linkedin.com/in/jayancv >>>>> >>>>> _______________________________________________ >>>>> Architecture mailing list >>>>> [email protected] >>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>> >>>>> >>>> >>>> >>>> -- >>>> Thanks & Regards, >>>> >>>> Fazlan Nazeem >>>> >>>> *Software Engineer* >>>> >>>> *WSO2 Inc* >>>> Mobile : +94772338839 >>>> <%2B94%20%280%29%20773%20451194> >>>> [email protected] >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> [email protected] >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >>> _______________________________________________ >>> Architecture mailing list >>> [email protected] >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> Thanks, >> Miyuru Dayarathna >> Senior Technical Lead >> Mobile: +94713527783 <+94%2071%20352%207783> >> Blog: http://miyurublog.blogspot.com >> > > > > -- > Thanks & Regards, > > Fazlan Nazeem > > *Software Engineer* > > *WSO2 Inc* > Mobile : +94772338839 > <%2B94%20%280%29%20773%20451194> > [email protected] >
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
