Hi All,
On Jun 17, 2014, at 1:49 AM, Lahiru Gunathilake wrote:
> Hi All,
>
> I am planning to evaluate different event stream clustering algorithms as
> part of my studies(I am a graduate student at indiana University). I think
> Siddhi is a good place to experiment this, As per my understanding based on
> the docs Siddhi doesn't have a stream clustering interface I can use directly
> to plug my own algorithm. So I am thinking of first come up an interface for
> different clustering algorithms and add implementation of algorithms for each
> event stream by invoking an operation like SiddhiManager.addQuery. Or I can
> make the algorithm configure as part of query language. If the second option
> is more consistent with current model I can wrap-up the work in that way but
> initially focussing on first approach will be easier for me. So each
> algorithm can be associated to a desired event Stream or can be associated
> globally. If its associated with each stream algorithm will run local to each
> stream otherwise it will run in global context. Based on the algorithm I can
> provide a way to configure it with parameters.
>
I am sure I have confused with above implementation details, after looking in
to Siddhi extension points I figured out I just have to implement a new window
type. I have implemented one algorithm to keep the most frequent events
came in a event stream. So queries can looks like below,
from cseEventStream#window.frequent(2) " +
"select symbol, price " +
"insert into StockQuote;
There are multiple algorithms to keep the most frequent events in a given
window size for now I just implemented a simple algorithm[1] with the
processing complexity of O(1) and space complexity O(n) where n is the limit of
the most frequent items. I have created a patch and attached it to jira[2].
[1] Jayadev, and David Gries Misra, "Finding
repeated elements," in Science of computer
programming 2, no.
2 (1982): 143-152.
[2]https://wso2.org/jira/browse/CEP-872
Thanks
Lahiru
> To start this I hope to implement a frequent item set mining algorithm which
> can be used to find out most frequent items of an event stream. Search
> engines use these kind of data to find out most frequent searches in a given
> time window and optimize the search queries. I can start with some algorithms
> like Misra-Gries algorithm[1] and Manku and Motwani [2] and then move
> towards more of data clustering algorithms. For the time being I will write
> the clustering results in to a file and later I think I can use more stable
> storage (either wso2 registry or other prefered way in wso2 product stack).
> If Siddhi or WSO2 CEP already have the capability of frequent item mining I
> will start with a more classification type algorithm.
>
> Your feedback will be very useful for my work. If you have requirement for
> any specific type of algorithms based on the real client interactions you
> have, I would like to know them and implement them with Siddhi and do the
> comparison.
>
> Thanks
> Lahiru
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture