Hi Fazlan,
Model is retrained with each and every data point(Training data points)
continuously  and we are allowed to use only the latest model. I checked is
there any way to persist the trained models but currently  Samoa haven't
that feature.


Regards,

Jayan Vidanapathirana
Intern Software Engineer,
WSO2.
mobile +94715594516
[image: https://lk.linkedin.com/in/jayancv]
<https://lk.linkedin.com/in/jayancv>


On Mon, Dec 5, 2016 at 10:54 AM, Fazlan Nazeem <[email protected]> wrote:

> Thanks Miyuru.
>
> So do we preserve the old models in some way? or the old models get
> updated continuously and at a specific point in time we are allowed to use
> only the latest available model?
>
> On Fri, Dec 2, 2016 at 3:13 PM, Miyuru Dayarathna <[email protected]>
> wrote:
>
>> Hi,
>>
>> @Fazlan
>> Currently the logger output from the streamingml extension prints the
>> corresponding streaming ML model's accuracy statistics after each display
>> interval elapses.
>>
>> @Jayan,
>> Can we attach an ID to the accuracy statistics so that we will not get
>> confused if we run multiple streaming ML queries simultaneously?
>>
>> --
>> Thanks,
>> Miyuru Dayarathna
>> Senior Technical Lead
>> Mobile: +94713527783 <+94%2071%20352%207783>
>> Blog: http://miyurublog.blogspot.com
>>
>> On Thu, Dec 1, 2016 at 2:15 PM, Jayan Vidanapathirana <[email protected]>
>> wrote:
>>
>>> Hi Fazlan,
>>>
>>> I think this API doc will solve your question.
>>>
>>> [1] streamingML apiDoc - https://docs.google.com/a/wso2
>>> .com/document/d/1bxDLwfNSyxvt1K9tCTE1HcWVVo4mK88-Ozxav6yVWic
>>> /edit?usp=sharing
>>>
>>> Thanks.
>>>
>>>
>>> Regards,
>>>
>>> Jayan Vidanapathirana
>>> Intern Software Engineer,
>>> WSO2.
>>> mobile +94715594516 <+94%2071%20559%204516>
>>> [image: https://lk.linkedin.com/in/jayancv]
>>> <https://lk.linkedin.com/in/jayancv>
>>>
>>>
>>> On Thu, Dec 1, 2016 at 12:42 PM, Fazlan Nazeem <[email protected]> wrote:
>>>
>>>> Hi Jayan,
>>>>
>>>> Is there a way to output the accuracy of a specific model within a
>>>> siddhi execution plan?
>>>>
>>>>
>>>>
>>>> On Wed, Nov 30, 2016 at 4:38 PM, Jayan Vidanapathirana <[email protected]
>>>> > wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> I am one of the interns working on the "Streaming Machine Learning on
>>>>> WSO2 CEP" Project. I have built a Siddhi extension to CEP using Apache
>>>>> SAMOA machine learning.
>>>>>
>>>>> “SAMOA (Scalable Advanced Massive Online Analysis) is a platform for
>>>>> mining big data streams. Currently, this is an apache incubator
>>>>> project.Samoa is written in Java  and it is open source, and available at
>>>>> http://samoa-project.net under the Apache Software License version
>>>>> 2.0.
>>>>>
>>>>> As a framework : it allows algorithm developers to abstract from the
>>>>> underlying execution engine, and therefore reuse their code on different
>>>>> engines. It features a pluggable architecture that allows it to run on
>>>>> several distributed stream processing engines such as Storm, S4, and 
>>>>> Samza.
>>>>> This capability is achieved by designing a minimal API that captures the
>>>>> essence of modern DSPEs. This API also allows to easily write new bindings
>>>>> to port SAMOA to new execution engines.
>>>>>
>>>>> As a library: SAMOA contains implementations of state-of-the-art
>>>>> algorithms for distributed machine learning on streams. Currently, SAMOA
>>>>> implemented vertical Hoeffding tree for classification, distributed 
>>>>> k-means
>>>>> algorithm for clustering, and adaptive model rules(Have two
>>>>> implementations) for regression, as well as programming abstractions to
>>>>> develop new algorithms.The library also includes meta-algorithms such as
>>>>> bagging and boosting(ensemble techniques) for improve the predictive 
>>>>> force.”
>>>>>
>>>>> I created a siddhi extension using samoa as a machine learning
>>>>> algorithm library. It contains classification, regression and clustering
>>>>> extensions and SAMOA local mode(not the Distributed version) without a
>>>>> cluster. Also, these extensions provide different API calls.
>>>>>
>>>>> [image: Streaming Machine learning SAMOA integrate to CEP
>>>>> (Abstract).jpg]
>>>>>
>>>>> Main architecture
>>>>>
>>>>>
>>>>>
>>>>> After creating the extensions I tested streaming machine learning
>>>>> accuracy using samoa  and batch processing accuracy using weka machine
>>>>> learner.
>>>>>
>>>>> Classification (Vertical Hoeffding Tree)Using MAGIC Gamma Telescope
>>>>> Data Set
>>>>> <https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope>
>>>>>
>>>>> 18000 data
>>>>>
>>>>> Batch Process (Using WSO2 ML)
>>>>>
>>>>> Streaming
>>>>>
>>>>> Class 1
>>>>>
>>>>> Class 2
>>>>>
>>>>> Class 1
>>>>>
>>>>> Class 2
>>>>>
>>>>> Accuracy
>>>>>
>>>>> 82.72
>>>>>
>>>>> 73.4
>>>>>
>>>>> F1-Score
>>>>>
>>>>> 87.09
>>>>>
>>>>> 73.86
>>>>>
>>>>> 80.41
>>>>>
>>>>> 58.53
>>>>>
>>>>> The accuracy of the batch process is higher than samoa streaming
>>>>> process. If that stream has not drifted then the streaming process 
>>>>> accuracy
>>>>> increases with the time and it will get a stable state.
>>>>>
>>>>> Regression (AMRules) Using Combined Cycle Power Plant Data Set (CCPP)
>>>>> <https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant>
>>>>>
>>>>> Data Points 9500
>>>>>
>>>>> Samoa (Adaptive Model Rules Regressor)
>>>>>
>>>>> Weka
>>>>>
>>>>> linearRegression
>>>>>
>>>>> M5Rules
>>>>>
>>>>> Mean absolute error
>>>>>
>>>>> 3.68
>>>>>
>>>>> 3.63
>>>>>
>>>>> 3.06
>>>>>
>>>>> Root mean squared error
>>>>>
>>>>> 6.69
>>>>>
>>>>> 4.56
>>>>>
>>>>> 3.99
>>>>>
>>>>> Relative absolute error
>>>>>
>>>>> 24.7
>>>>>
>>>>> 24.43
>>>>>
>>>>> 20.61
>>>>>
>>>>> Root relative squared error
>>>>>
>>>>> 37.8
>>>>>
>>>>> 26.7
>>>>>
>>>>> 23.4
>>>>>
>>>>> I did regression test using 2 datasets and classification test using 2
>>>>> data sets. According to those results I saw there is no huge error between
>>>>> streaming and batch process. Comparing with classification and clustering,
>>>>> streaming regression and batch regression have similar error rates.
>>>>> Therefore I think streaming ml  is really suitable for regression.
>>>>>
>>>>> Clustering (k-means) Using 3D Road Network (North Jutland, Denmark)
>>>>> Data Set
>>>>> <https://archive.ics.uci.edu/ml/datasets/3D+Road+Network+%28North+Jutland,+Denmark%29>
>>>>>
>>>>> Data points 434874
>>>>>
>>>>> Attribute_1
>>>>>
>>>>> Attribute_2
>>>>>
>>>>> Attribute_3
>>>>>
>>>>> Attribute_4
>>>>>
>>>>> Samoa
>>>>>
>>>>> Weka
>>>>>
>>>>> Samoa
>>>>>
>>>>> Weka
>>>>>
>>>>> Samoa
>>>>>
>>>>> Weka
>>>>>
>>>>> Samoa
>>>>>
>>>>> Weka
>>>>>
>>>>> Center_0
>>>>>
>>>>> 100098819.2
>>>>>
>>>>> 111598410.7
>>>>>
>>>>> 9.77
>>>>>
>>>>> 10.2
>>>>>
>>>>> 57.16
>>>>>
>>>>> 57.37
>>>>>
>>>>> 21.23
>>>>>
>>>>> 19.4
>>>>>
>>>>> Center_1
>>>>>
>>>>> 36598276.23
>>>>>
>>>>> 35877429.78
>>>>>
>>>>> 9.72
>>>>>
>>>>> 9.88
>>>>>
>>>>> 57.05
>>>>>
>>>>> 56.87
>>>>>
>>>>> 21.87
>>>>>
>>>>> 22.47
>>>>>
>>>>> Center_2
>>>>>
>>>>> 138161280.2
>>>>>
>>>>> 116561030.9
>>>>>
>>>>> 9.57
>>>>>
>>>>> 9.35
>>>>>
>>>>> 57.09
>>>>>
>>>>> 57.15
>>>>>
>>>>> 23.15
>>>>>
>>>>> 23.17
>>>>>
>>>>> Mean
>>>>>
>>>>> 97869870.26
>>>>>
>>>>> 9.7318
>>>>>
>>>>> 57.0838
>>>>>
>>>>> 22.1854
>>>>>
>>>>> 10 Iterations, K-Means algorithm
>>>>>
>>>>> In streaming clustering the range of the cluster centers is thinner
>>>>> than batch process cluster centers range.
>>>>>
>>>>> References
>>>>>
>>>>> [1] - Samoa research paper  http://www.jmlr.org/papers/vo
>>>>> lume16/morales15a/morales15a.pdf
>>>>>
>>>>> [2] - Samoa docs  http://samoa.incubator.apache.org/
>>>>>
>>>>> [3] - Git repository  https://github.com/Jayancv/streaingML
>>>>> <https://github.com/Jayancv/streamingML>
>>>>>
>>>>> [4] - Statistics of tests https://docs.google.com/a/wso2
>>>>> .com/spreadsheets/d/1uROw0gGIu_Ht0J0YnSOHoH600ZnJG9ejp9ztMaX
>>>>> A09s/edit?usp=sharing
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Regards,
>>>>>
>>>>> Jayan Vidanapathirana
>>>>> Intern Software Engineer,
>>>>> WSO2.
>>>>> mobile +94715594516 <+94%2071%20559%204516>
>>>>> <http://www.linkedin.com/in/>www.linkedin.com/in/jayancv
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> [email protected]
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards,
>>>>
>>>> Fazlan Nazeem
>>>>
>>>> *Software Engineer*
>>>>
>>>> *WSO2 Inc*
>>>> Mobile : +94772338839
>>>> <%2B94%20%280%29%20773%20451194>
>>>> [email protected]
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> [email protected]
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Thanks,
>> Miyuru Dayarathna
>> Senior Technical Lead
>> Mobile: +94713527783 <+94%2071%20352%207783>
>> Blog: http://miyurublog.blogspot.com
>>
>
>
>
> --
> Thanks & Regards,
>
> Fazlan Nazeem
>
> *Software Engineer*
>
> *WSO2 Inc*
> Mobile : +94772338839
> <%2B94%20%280%29%20773%20451194>
> [email protected]
>
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to