> > >> >>> First of all, thank you very much for your explanations and >>> clarifications! It is very interesting and useful! >>> >>> Let me ask a few more questions and provide a few comments. >>> >>> > Hi All, these questions and answers are very educating. Shall we add >>> them to our doc FAQs? >>> >>> I think it would be a very good idea to add something like this to the >>> FAQs or to create some sort of an "architecture and implementation >>> overview" document. >>> >>> 1) How many rules/queries can be defined in one engine. How does it >>> affect performance? >>> >>> For example, can I define (tens of) thousands of queries using the >>> same (or multiple) instance of SiddhiManager? Would it make processing much >>> slower? Or is the speed not proportional to the number of queries? E.g. >>> when a new event arrives, does Siddhi test it in a linear fashion against >>> each query or does Siddhi keep an internal state machine that tries to >>> match an event against all rules at once? >>> >>> >>> > SiddhiManager can have many queries, and if you chain the queries in a >>> liner fashion then all those queries will be executed >>> > one after the other and you might see some performance degradation, >>> but if you have have then parallel then there wont be >>> > any issues. >>> >>> Well, before I got this answer, I created a few test-cases to check >>> experimentally how it behaves. I created a single instance of a >>> SiddhiManager, added 10000 queries that all read from the same input >>> stream, check if a specific attribute (namely, price) of an event is inside >>> a given random interval ( [ price >= random_low and price <= random_high] ) >>> and output into randomly into one of 100 streams. Then I measured the time >>> required to process 1000000 events using this setup. I also did exactly the >>> same experiment with Esper. >>> >>> My findings were that Siddhi is much slower than Esper in this setup. >>> After looking into the internal implementations of both, I realized the >>> reason. Siddhi processes all queries that read from the same input stream >>> in a linear fashion, sequentially. Even if many of the queries have almost >>> the same condition, no optimization attempts are done by Siddhi. Esper >>> detects that many queries have a condition on the same variable and create >>> some sort of a decision tree. As a result, their running time in log N, >>> where as Siddhi needs O(n). >>> >>> I'm not saying that this test-case if very typical or important, but may >>> be Siddhi should try to analyze the complete set of queries and try to >>> apply some optimizations, when it is possible? I.e. it is a bit of a global >>> optimization applied. It could detect some common sub-expressions or >>> sub-conditions in the queries and evaluate them only once, instead of doing >>> it over and over again by evaluating each query separately. >>> >>> After getting these first results, I changed the setup, so that each >>> query uses one of many input streams (e.g. one of 300) instead of using the >>> same one. This greatly improved the situation, because now the number of >>> queries per input stream was much smaller and thus processing was way >>> faster. But even in this setup it is still about 5-6 times slower than >>> Esper in this situation. >>> >> >> Could you share your testcases?, and we can have a look. Yes we have not >> much worked with 1000s of queries much, >> > > Yes, I could provide my testcases - the source code is actually pretty > small. What is the best way to do it? Should I simply attach a ZIP file > with my project or better create a small github project? >
Could you report a JIRA here https://wso2.org/jira/browse/CEP and attach it? > > but likely it is something we can fix without much trouble. >> > > > Sounds promising. > > >> >>> >>> 2) Is it possible to easily disable/enable some queries? >>> >>> In my use-cases I have a lot of queries. Actually, I have a lot of >>> tenants and each tenant may have something like 10-100 queries. Rather >>> often (e.g. few times a day), tenants would like to disable/enable some of >>> their queries. What is a proper way to do it? Is it a costly operation, >>> i.e. does Siddhi need to perform a lot of processing to disable or enabled >>> a query? >>> Is it better to keep a dedicated SiddhiManager instance per tenant or is >>> it OK to have one SiddhiManager instance which handles all those tenants >>> with all their queries? >>> >>> > The general norm is, you have to use a SiddhiManager per scenario, >>> where each scenario might contain one or more queries, >>> > with this modal its easy if any tenant want to add a remove a scenario >>> and it will not affect other queries and tenants. >>> >>> If I have tens of thousands of tenants, then having a dedicated >>> SiddhiManager per tenant is probably not very practical or even possible, >>> as it will get pretty heave weight, I guess. >>> >>> Therefore, having the ability to enable/disable to query could be very >>> practical. In fact, it could be probably implemented very easily. >>> Imagine that each query object has a boolean flag that indicates if it is >>> enabled or not. If the condition matches and before Siddhi tries to perform >>> the insert, i.e. the action, it could check if the query is disabled. If it >>> is disabled, no action (i.e. insert) is performed at all. Of course, there >>> is still some overhead when matching the query. But may be even this can be >>> skipped if query is disabled? I.e. conditions are immediately evaluated to >>> "false" and thus never trigger? >>> >>> BTW, Esper has this feature. You can disable/enable any query without >>> removing and later adding it again. >>> >> My understanding is Siddhi manager is not heavy, but will let Suho >> answer. >> >> >> >>> >>> When it comes to Siddhi persistent stores, you write: >>> >It only stores the state information of the processing, E.g the current >>> running Avg of the average calculation. This will be used >when server >>> recovers from a failure. >>> >>> OK. I understand what it does now. BTW, does it also store any sliding >>> windows as well so that failover may happen? >>> >> Yes, it store everything so fail over works. >> >> >>> >>> My further question is: How to support more dynamic scenarios, where the >>> set of queries is not totally static? What if the set of rules changes a >>> few times per hour/day/etc? May be it would also make sense to persist a >>> set of queries that were deployed on a given SiddhiManager? This way a user >>> doesn't need to perform any custom book-keeping for the set of queries. >>> >>> Yet another question about Siddhi: >>> Is it possible to express queries that work with absolute time or timers >>> without providing a time inside events? E.g. how can one express in >>> the query something like: "time is between 9:30 AM and 10:00 AM"? It is >>> possible to work with timers in the query? Basically, I'd like to trigger >>> certain actions at a specific time or on a regular basis (every N minutes) >>> and I'm wondering how this can be expressed using Siddhi's query language. >>> >> >> One trick I have used is I have created an timer stream, and sent event >> to that timer stream periodically and I have written the query using that >> timer stream to do what I need. We wanted to add timer as an inbuilt >> concept, so you just days from Timer(10s) to receive events every 10 secs , >> but not yet added I think. >> >> > > Ah, so it is a planned feature? Cool! > Yes > > >> >>> And my last question for now: >>> Is it possible to have nested structures in events, e.g. something like >>> this: "select field1.field12[3].field1234 from ..."? It means that an >>> event has a field called field1, which in turn has an array sub-field >>> called field12, and each element of this array has a field field1234. Is it >>> possible? Or does Siddhi assume a flat structure of events, i.e. each event >>> can have only fields of basic types? >>> >> >> No we do not do nested structure within Siddhi, it assumes flat events. >> e.g. XML we take and match to a flat structure. >> > > OK. I understand. I'd say it covers 90% of all use-cases. But having > support for nested structures (a-la Esper) could be interesting. And I > think the implementation would be pretty straight forward. > We have input and output adaptor that let us map tree (e.g. like XML) to a flat structure. Yes still there are some scenarios it does not cover. > > BTW, a few questions somewhat related to this question: > - What if I need to handle events which have a few mandatory fields and > all other fields are optional? "define stream" only allows for a fixed > structure, AFAIK. Especially, because it is assumed to be mapped to an > object array. But it could be interesting to allow mapping of events to > key/value maps. With this representation it can be pretty easy to support > events/streams with any number of fields. The mandatory ones can be > described in "define stream" and others are basically accessed at run-time > by means of a key lookup. The syntax could be: > define stream MyType map (fiield1 string, field2 int, field3 float) > Siddhi does not support optional fields, we did this for performance actually. - In some of my test-cases, I wanted to avoid using a single stream for all > tenants (because it is very slow - see my previous messages). So, I created > one stream per tenant (e.g. 300000). All such streams are structurally the > same, but have different names. I noticed that it consumes quite some > memory, because stream definitions are not shared, even though they are > immutable as far as I understand. May be it would be a good idea to share > stream definitions if they are the same? I.e. StreamDefinition has two > fields: "String name" and "StreamRepresentation streamRep". The > representation part could be shared by all streams with the same structure. > Even better idea could be to allow custom types. Then you could do > something like: > define type MyType (fiield1 string, field2 int, field3 float) > define stream MyStream1 MyType > define stream MyStream2 MyType > define stream MyStream3 MyType > define stream MyStream4 MyType > ... > > Plus, if custom types could be defined, one could allow using them in > stream/type definitions, e.g.: > define type MySecondType (fiield1 string, field2 int, field3 float, field3 > MyType) > Sharing stream representations is a good idea, and I think it is not too hard to do. Could you open a Jira? Now, a different question: As far as I understand, it is currently possible > to join only 2 streams at once. Is it a correct understanding? If this is > the case, I'd like to understand the reasons for this limitation. Is there > a real technical problem that makes joining of >=3 streams difficult or > impossible? Or is it a temporary problem? Some of the rules used in my > use-cases require inputs from 4-6 streams. Modeling it using multi-level > 2-way joins is really annoying. Having support for n-way joins would make > my life much easier ;-) > We decided to keep it simple. May be we should do syntax this, that internally do a multi-level join. This though need some work, and will take some time before we get to it. > And BTW the current syntax for sequences is a bit ... misleading, IMHO > (though it is a minor issue). When someone writes "from Stream1 as s1, > Stream2 as s2, Stream3 as s3 ....", one usually expects that it means a > join from all those streams, because this is how SQL works and some other > CEP engines as well (e.g. Esper). But Siddhi treats it as a sequence of > events, which is a very different thing. Therefore I think that this syntax > is a bit dangerous for newcomers or those familiar with SQL and/or other > CEP engines. > When difference is known, then it is pretty intuitive and powerful. We though it is much easier to think about it that way. But I see what you mean as well. > > One more thing I noticed while experimenting with Siddhi: > - Siddhi JARs are available from Maven central or WSO2 maven repos, which > is very nice. But would it be possible to provide source jars as well (not > only for siddhi, but also for all WSO2 projects)? Right now they are not > available and I had to checkout the whole WSO2 repo to build Siddhi binary > and source JARs. And this repo checkout is > 500 MB big, so that it takes a > while ;-( > > > Thanks, > Leo > -- ============================ Srinath Perera, Ph.D. Director, Research, WSO2 Inc. Visiting Faculty, University of Moratuwa Member, Apache Software Foundation Research Scientist, Lanka Software Foundation Blog: http://srinathsview.blogspot.com/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
