Hi All,
On Jun 17, 2014, at 1:49 AM, Lahiru Gunathilake wrote:

> Hi All,
> 
> I am planning to evaluate different event stream clustering algorithms as 
> part of my studies(I am a graduate student at indiana University). I think 
> Siddhi is a good place to experiment this, As per my understanding based on 
> the docs Siddhi doesn't have a stream clustering interface I can use directly 
> to plug my own algorithm. So I am thinking of first come up an interface for 
> different clustering algorithms and add implementation of algorithms for each 
> event stream by invoking an operation like SiddhiManager.addQuery. Or I can 
> make the algorithm configure as part of query language. If the second option 
> is more consistent with current model I can wrap-up the work in that way but 
> initially focussing on first approach will be easier for me. So each 
> algorithm can be associated to a desired event Stream or can be associated 
> globally. If its associated with each stream algorithm will run local to each 
> stream otherwise it will run in global context. Based on the algorithm I can 
> provide a way to configure it with parameters.
> 
I am sure I have confused with above implementation details, after looking in 
to Siddhi extension points I figured out I just have to implement a new window 
type. I have implemented one algorithm to keep the most frequent events 
came in a event stream. So queries can looks like below,

from  cseEventStream#window.frequent(2) " +
                                                       "select symbol, price " +
                                                       "insert into StockQuote;

There are multiple algorithms to keep the most frequent events in a given 
window size for now I just implemented a simple algorithm[1] with the 
processing complexity of O(1) and space complexity O(n) where n is the limit of 
the most frequent items. I have created a patch and attached it to jira[2].

[1]     Jayadev,        and     David   Gries   Misra,  "Finding        
repeated        elements,"      in      Science of      computer        
programming     2,      no.     
2       (1982): 143-152.
[2]https://wso2.org/jira/browse/CEP-872

Thanks
Lahiru
> To start this I hope to implement a frequent item set mining algorithm which 
> can be used to find out most frequent items of an event stream. Search 
> engines use these kind of data to find out most frequent searches in a given 
> time window and optimize the search queries. I can start with some algorithms 
> like Misra-Gries algorithm[1] and Manku and    Motwani [2] and then move 
> towards more of data clustering algorithms. For the time being I will write 
> the clustering results in to a file and later I think I can use more stable 
> storage (either wso2 registry or other prefered way in wso2 product stack). 
> If Siddhi or WSO2 CEP already have the capability of frequent item mining I 
> will start with a more classification type algorithm.
> 
> Your feedback will be very useful for my work. If you have requirement for 
> any specific type of algorithms based on the real client interactions you 
> have, I would like to know them and implement them with Siddhi and do the 
> comparison.
> 
> Thanks
> Lahiru
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to