Hi Leo, Please see my comments inline.
> First of all, thank you very much for your explanations and > clarifications! It is very interesting and useful! > > Let me ask a few more questions and provide a few comments. > > > Hi All, these questions and answers are very educating. Shall we add > them to our doc FAQs? > > I think it would be a very good idea to add something like this to the > FAQs or to create some sort of an "architecture and implementation > overview" document. > > 1) How many rules/queries can be defined in one engine. How does it affect > performance? > > For example, can I define (tens of) thousands of queries using the same > (or multiple) instance of SiddhiManager? Would it make processing much > slower? Or is the speed not proportional to the number of queries? E.g. > when a new event arrives, does Siddhi test it in a linear fashion against > each query or does Siddhi keep an internal state machine that tries to > match an event against all rules at once? > > > > SiddhiManager can have many queries, and if you chain the queries in a > liner fashion then all those queries will be executed > > one after the other and you might see some performance degradation, but > if you have have then parallel then there wont be > > any issues. > > Well, before I got this answer, I created a few test-cases to check > experimentally how it behaves. I created a single instance of a > SiddhiManager, added 10000 queries that all read from the same input > stream, check if a specific attribute (namely, price) of an event is inside > a given random interval ( [ price >= random_low and price <= random_high] ) > and output into randomly into one of 100 streams. Then I measured the time > required to process 1000000 events using this setup. I also did exactly the > same experiment with Esper. > > My findings were that Siddhi is much slower than Esper in this setup. > After looking into the internal implementations of both, I realized the > reason. Siddhi processes all queries that read from the same input stream > in a linear fashion, sequentially. Even if many of the queries have almost > the same condition, no optimization attempts are done by Siddhi. Esper > detects that many queries have a condition on the same variable and create > some sort of a decision tree. As a result, their running time in log N, > where as Siddhi needs O(n). > > I'm not saying that this test-case if very typical or important, but may > be Siddhi should try to analyze the complete set of queries and try to > apply some optimizations, when it is possible? I.e. it is a bit of a global > optimization applied. It could detect some common sub-expressions or > sub-conditions in the queries and evaluate them only once, instead of doing > it over and over again by evaluating each query separately. > > After getting these first results, I changed the setup, so that each query > uses one of many input streams (e.g. one of 300) instead of using the same > one. This greatly improved the situation, because now the number of queries > per input stream was much smaller and thus processing was way faster. But > even in this setup it is still about 5-6 times slower than Esper in this > situation. > Could you share your testcases?, and we can have a look. Yes we have not much worked with 1000s of queries much, but likely it is something we can fix without much trouble. > > > 2) Is it possible to easily disable/enable some queries? > > In my use-cases I have a lot of queries. Actually, I have a lot of tenants > and each tenant may have something like 10-100 queries. Rather often (e.g. > few times a day), tenants would like to disable/enable some of their > queries. What is a proper way to do it? Is it a costly operation, i.e. does > Siddhi need to perform a lot of processing to disable or enabled a query? > Is it better to keep a dedicated SiddhiManager instance per tenant or is > it OK to have one SiddhiManager instance which handles all those tenants > with all their queries? > > > The general norm is, you have to use a SiddhiManager per scenario, where > each scenario might contain one or more queries, > > with this modal its easy if any tenant want to add a remove a scenario > and it will not affect other queries and tenants. > > If I have tens of thousands of tenants, then having a dedicated > SiddhiManager per tenant is probably not very practical or even possible, > as it will get pretty heave weight, I guess. > > Therefore, having the ability to enable/disable to query could be very > practical. In fact, it could be probably implemented very easily. Imagine > that each query object has a boolean flag that indicates if it is enabled > or not. If the condition matches and before Siddhi tries to perform the > insert, i.e. the action, it could check if the query is disabled. If it is > disabled, no action (i.e. insert) is performed at all. Of course, there is > still some overhead when matching the query. But may be even this can be > skipped if query is disabled? I.e. conditions are immediately evaluated to > "false" and thus never trigger? > > BTW, Esper has this feature. You can disable/enable any query without > removing and later adding it again. > My understanding is Siddhi manager is not heavy, but will let Suho answer. > > When it comes to Siddhi persistent stores, you write: > >It only stores the state information of the processing, E.g the current > running Avg of the average calculation. This will be used >when server > recovers from a failure. > > OK. I understand what it does now. BTW, does it also store any sliding > windows as well so that failover may happen? > Yes, it store everything so fail over works. > > My further question is: How to support more dynamic scenarios, where the > set of queries is not totally static? What if the set of rules changes a > few times per hour/day/etc? May be it would also make sense to persist a > set of queries that were deployed on a given SiddhiManager? This way a user > doesn't need to perform any custom book-keeping for the set of queries. > > Yet another question about Siddhi: > Is it possible to express queries that work with absolute time or timers > without providing a time inside events? E.g. how can one express in the > query something like: "time is between 9:30 AM and 10:00 AM"? It is > possible to work with timers in the query? Basically, I'd like to trigger > certain actions at a specific time or on a regular basis (every N minutes) > and I'm wondering how this can be expressed using Siddhi's query language. > One trick I have used is I have created an timer stream, and sent event to that timer stream periodically and I have written the query using that timer stream to do what I need. We wanted to add timer as an inbuilt concept, so you just days from Timer(10s) to receive events every 10 secs , but not yet added I think. > > And my last question for now: > Is it possible to have nested structures in events, e.g. something like > this: "select field1.field12[3].field1234 from ..."? It means that an > event has a field called field1, which in turn has an array sub-field > called field12, and each element of this array has a field field1234. Is it > possible? Or does Siddhi assume a flat structure of events, i.e. each event > can have only fields of basic types? > No we do not do nested structure within Siddhi, it assumes flat events. e.g. XML we take and match to a flat structure. > > > Thanks, > Leo > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- ============================ Srinath Perera, Ph.D. http://people.apache.org/~hemapani/ http://srinathsview.blogspot.com/
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
