Hi Lahiru, As Srinath has mentioned as well, frequency counting algorithms are very useful in financial market scenarios as well (especially fraud detection and surveillance). Thanks for doing this and I will take a look too.
seshika On Fri, Jun 27, 2014 at 10:52 AM, Mohanadarshan Vivekanandalingam < [email protected]> wrote: > > > > On Fri, Jun 27, 2014 at 10:00 AM, Lahiru Gunathilake <[email protected] > > wrote: > >> Hi Mohan, >> > > Hi Lahiru, > > >> >> I wrote some samples but I can write more test-cases and provide another >> patch. Please feel free to change the naming of the windows as you like. >> > > Really appreciate your contribution.. Sure, I'll start look into this.. > > Thanks, > Mohan > > >> Regards >> Lahiru >> >> On Jun 26, 2014, at 11:35 PM, Srinath Perera wrote: >> >> Hi All, >> >> Lahiru and myself had a call today morning. >> >> Plan is to >> 1) Lahiru to look at hoeffding tree and other classification algorithms >> and select one to implement. He will compare the performance against MOA or >> some other implementation. >> 2) then we will use it for a Fraud analysis scenario as a proof of its >> validity. >> >> Then we will decide how to continue from that point. >> >> Mohan could you look at the patch? Lahiru will write test cases that you >> can use to verify. >> >> --Srinath >> >> >> On Tue, Jun 24, 2014 at 4:56 AM, Lahiru Gunathilake <[email protected] >> > wrote: >> >>> Hi Srinath, >>> >>> Thanks for the response. >>> On Jun 23, 2014, at 10:48 PM, Srinath Perera wrote: >>> >>> Hi Lahiru, >>> >>> Sorry for not responding earlier. I was traveling last week. >>> >>> I guess you know frequent item set != clustering algorithms. >>> >>> +1 >>> >>> >>> Can we have a chat sometime to discuss details about the implementation. >>> >>> Yes sure, that would be very useful. >>> >>> Thanks >>> Lahiru >>> >>> >>> --Srinath >>> >>> >>> On Sun, Jun 22, 2014 at 6:25 PM, Lahiru Gunathilake < >>> [email protected]> wrote: >>> >>>> Hi All, >>>> >>>> I have implemented another frequency counting algorithm[1] which is a >>>> classic algorithm for mining frequent items in a data stream. Basically >>>> users can specify a minimum average value of frequent items with an error >>>> value. >>>> >>>> This algorithm will accept two user-specified parameters: a support >>>> threshold s [0-1] and an error parameter e [0-1] such that e << s and >>>> recommended number for e is normally s/10 or s/20. Let N denote the >>>> current length >>>> of the stream, i.e., the number of tuples seen so far. At any point of >>>> time, this algorithm can be asked to produce a list of events along >>>> with their estimated frequencies. The answers produced by this >>>> algorithm will have the >>>> following guarantees: >>>> 1. All item(set)s whose true frequency exceeds sN are outputs. >>>> There are no false negatives . >>>> 2. No events whose true frequency is less than (s - e) N >>>> is output. >>>> 3. Estimated frequencies are less than the true frequencies >>>> by at most eN. >>>> . >>>> The incoming stream is conceptually divided in to buckets of width w = >>>> ceiling(1/e) transactions each. Buckets are labeled with bucket ids , >>>> starting from 1.We denote the current bucket id by bcurrent whose >>>> value is ceiling(N/w). For an element , we denote its true frequency >>>> in the stream seen so far by "fe" . Note that e and w are fixed for a >>>> data stream while N, bcurrent and fe are the variables whose value changes >>>> when the stream progress. >>>> Here the data structure , is a set of entries are tuples with the >>>> form of (event, f, delta), where event is the actual event and "f" is >>>> the is an integer representing its estimated frequency, and delta is >>>> the maximum possible error in "fe". >>>> >>>> In this algorithm every new event will anyways added in to the >>>> data-structure optimistically and there is no initial condition to enter in >>>> to the data-structure but iteratively if certain events are not match to >>>> the condition provided >>>> by the user those events will be removed when N%windowSize = 0, during >>>> this I output those events as expired-events in the window. For the >>>> incoming events if event is already exists I output those as current-events >>>> and if its a new >>>> event I just add it to the data-structure and iterate through all the >>>> events available and find the matching events based on s and e values and >>>> only those events will be output. >>>> >>>> I am not sure based on the window definition this is the correct >>>> approach or may be windows aren't the best way to associate this algorithm >>>> in to siddhi. I have attached my patch to jira[2] and if you can look that >>>> would be great. >>>> >>>> Siddhi Query will looks like below, >>>> from cseEventStream#window.lossyFrequent(0.1,0.01) " + >>>> "select symbol, >>>> price " + >>>> "insert into >>>> StockQuote; >>>> >>>> >>>> [1]Gurmeet Singh Manku and Rajeev Motwani. 2002. Approximate frequency >>>> counts over data streams. In Proceedings of the 28th international >>>> conference on Very Large Data Bases (VLDB '02). VLDB Endowment 346-357. >>>> >>>> Regards >>>> Lahiru >>>> On Jun 19, 2014, at 2:36 AM, Lahiru Gunathilake wrote: >>>> >>>> Hi All, >>>> On Jun 17, 2014, at 1:49 AM, Lahiru Gunathilake wrote: >>>> >>>> Hi All, >>>> >>>> I am planning to evaluate different event stream clustering algorithms >>>> as part of my studies(I am a graduate student at indiana University). I >>>> think Siddhi is a good place to experiment this, As per my understanding >>>> based on the docs Siddhi doesn't have a stream clustering interface I can >>>> use directly to plug my own algorithm. So I am thinking of first come up an >>>> interface for different clustering algorithms and add implementation of >>>> algorithms for each event stream by invoking an operation like >>>> SiddhiManager.addQuery. Or I can make the algorithm configure as part of >>>> query language. If the second option is more consistent with current model >>>> I can wrap-up the work in that way but initially focussing on first >>>> approach will be easier for me. So each algorithm can be associated to a >>>> desired event Stream or can be associated globally. If its associated with >>>> each stream algorithm will run local to each stream otherwise it will run >>>> in global context. Based on the algorithm I can provide a way to configure >>>> it with parameters. >>>> >>>> I am sure I have confused with above implementation details, after >>>> looking in to Siddhi extension points I figured out I just have to >>>> implement a new window type. I have implemented one algorithm to keep the >>>> most frequent events >>>> came in a event stream. So queries can looks like below, >>>> >>>> from cseEventStream#window.frequent(2) " + >>>> "select symbol, >>>> price " + >>>> "insert into >>>> StockQuote; >>>> >>>> There are multiple algorithms to keep the most frequent events in a >>>> given window size for now I just implemented a simple algorithm[1] with the >>>> processing complexity of O(1) and space complexity O(n) where n is the >>>> limit of the most frequent items. I have created a patch and attached it to >>>> jira[2]. >>>> >>>> [1] Jayadev, and David Gries Misra, "Finding repeated elements," in >>>> Science of computer programming 2, no. >>>> 2 (1982): 143-152. >>>> [2]https://wso2.org/jira/browse/CEP-877 >>>> >>>> >>>> Thanks >>>> Lahiru >>>> >>>> To start this I hope to implement a frequent item set mining algorithm >>>> which can be used to find out most frequent items of an event stream. >>>> Search engines use these kind of data to find out most frequent searches in >>>> a given time window and optimize the search queries. I can start with some >>>> algorithms like Misra-Gries algorithm[1] and Manku and Motwani [2] and >>>> then move towards more of data clustering algorithms. For the time being I >>>> will write the clustering results in to a file and later I think I can use >>>> more stable storage (either wso2 registry or other prefered way in wso2 >>>> product stack). If Siddhi or WSO2 CEP already have the capability of >>>> frequent item mining I will start with a more classification type >>>> algorithm. >>>> >>>> Your feedback will be very useful for my work. If you have requirement >>>> for any specific type of algorithms based on the real client interactions >>>> you have, I would like to know them and implement them with Siddhi and do >>>> the comparison. >>>> >>>> Thanks >>>> Lahiru >>>> _______________________________________________ >>>> Architecture mailing list >>>> [email protected] >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> [email protected] >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> [email protected] >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >>> >>> -- >>> ============================ >>> Srinath Perera, Ph.D. >>> http://people.apache.org/~hemapani/ >>> http://srinathsview.blogspot.com/ >>> _______________________________________________ >>> Architecture mailing list >>> [email protected] >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >>> >>> _______________________________________________ >>> Architecture mailing list >>> [email protected] >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> ============================ >> Srinath Perera, Ph.D. >> http://people.apache.org/~hemapani/ >> http://srinathsview.blogspot.com/ >> _______________________________________________ >> Architecture mailing list >> [email protected] >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> >> > > > -- > *V. Mohanadarshan* > *Software Engineer,* > *Data Technologies Team,* > *WSO2, Inc. http://wso2.com <http://wso2.com> * > *lean.enterprise.middleware.* > > email: [email protected] > phone:(+94) 771117673 > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > >
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
