Re: [Architecture] A few questions about WSO2 CEP/Siddhi

Srinath Perera Mon, 10 Mar 2014 20:27:13 -0700

Hi Leo,

Please see my comments inline.



> First of all, thank you very much for your explanations and
> clarifications! It is very interesting and useful!
>
> Let me ask a few more questions and provide a few comments.
>
> > Hi All, these questions and answers are very educating. Shall we add
> them to our doc FAQs?
>
> I think it would be a very good idea to add something like this to the
> FAQs or to create some sort of an "architecture and implementation
> overview" document.
>
> 1) How many rules/queries can be defined in one engine. How does it affect
> performance?
>
>    For example, can I define (tens of) thousands of queries using the same
> (or multiple) instance of SiddhiManager? Would it make processing much
> slower? Or is the speed not proportional to the number of queries? E.g.
> when a new event arrives, does Siddhi test it in a linear fashion against
> each query or does Siddhi keep an internal state machine that tries to
> match an event against all rules at once?
>
>
> > SiddhiManager can have many queries, and if you chain the queries in a
> liner fashion then all those queries will be executed
> > one after the other and you might see some performance degradation, but
> if you have have then parallel then there wont be
> > any issues.
>
> Well, before I got this answer, I created a few test-cases to check
> experimentally how it behaves. I created a single instance of a
> SiddhiManager, added 10000 queries that all read from the same input
> stream, check if a specific attribute (namely, price) of an event is inside
> a given random interval ( [ price >= random_low and price <= random_high] )
> and output into randomly into one of 100 streams. Then I measured the time
> required to process 1000000 events using this setup. I also did exactly the
> same experiment with Esper.
>
> My findings were that Siddhi is much slower than Esper in this setup.
> After looking into the internal implementations of both, I realized the
> reason. Siddhi processes all queries that read from the same input stream
> in a linear fashion, sequentially. Even if many of the queries have almost
> the same condition, no optimization attempts are done by Siddhi. Esper
> detects that many queries have a condition on the same variable and create
> some sort of a decision tree. As a result, their running time in log N,
> where as Siddhi needs O(n).
>
> I'm not saying that this test-case if very typical or important, but may
> be Siddhi should try to analyze the complete set of queries and try to
> apply some optimizations, when it is possible? I.e. it is a bit of a global
> optimization applied. It could detect some common sub-expressions or
> sub-conditions in the queries and evaluate them only once, instead of doing
> it over and over again by evaluating each query separately.
>
> After getting these first results, I changed the setup, so that each query
> uses one of many input streams (e.g. one of 300) instead of using the same
> one. This greatly improved the situation, because now the number of queries
> per input stream was much smaller and thus processing was way faster. But
> even in this setup it is still about 5-6 times slower than Esper in this
> situation.
>

 Could you share your testcases?, and we can have a look. Yes we have not
much worked with 1000s of queries much, but likely it is something we can
fix without much trouble.

>
>
> 2) Is it possible to easily disable/enable some queries?
>
> In my use-cases I have a lot of queries. Actually, I have a lot of tenants
> and each tenant may have something like 10-100 queries. Rather often (e.g.
> few times a day), tenants would like to disable/enable some of their
> queries. What is a proper way to do it? Is it a costly operation, i.e. does
> Siddhi need to perform a lot of processing to disable or enabled a query?
> Is it better to keep a dedicated SiddhiManager instance per tenant or is
> it OK to have one SiddhiManager instance which handles all those tenants
> with all their queries?
>
> > The general norm is, you have to use a SiddhiManager per scenario, where
> each scenario might contain one or more queries,
> > with this modal its easy if any tenant want to add a remove a scenario
> and it will not affect other queries and tenants.
>
> If I have tens of thousands of tenants, then having a dedicated
> SiddhiManager per tenant is probably not very practical or even possible,
> as it will get pretty heave weight, I guess.
>
> Therefore, having the ability to enable/disable to query could be very
> practical. In fact, it could be probably implemented very easily. Imagine
> that each query object has a boolean flag that indicates if it is enabled
> or not. If the condition matches and before Siddhi tries to perform the
> insert, i.e. the action, it could check if the query is disabled. If it is
> disabled, no action (i.e. insert) is performed at all. Of course, there is
> still some overhead when matching the query. But may be even this can be
> skipped if query is disabled? I.e. conditions are immediately evaluated to
> "false" and thus never trigger?
>
> BTW, Esper has this feature. You can disable/enable any query without
> removing  and later adding it again.
>
My understanding is Siddhi manager is not  heavy, but will let Suho answer.



>
> When it comes to Siddhi persistent stores, you write:
> >It only stores the state information of the processing, E.g the current
> running Avg of the average calculation. This will be used >when server
> recovers from a failure.
>
> OK. I understand what it does now. BTW, does it also store any sliding
> windows as well so that failover may happen?
>
Yes, it store everything so fail over works.


>
> My further question is: How to support more dynamic scenarios, where the
> set of queries is not totally static? What if the set of rules changes a
> few times per hour/day/etc? May be it would also make sense to persist a
> set of queries that were deployed on a given SiddhiManager? This way a user
> doesn't need to perform any custom book-keeping for the set of queries.
>
> Yet another question about Siddhi:
> Is it possible to express queries that work with absolute time or timers
> without providing a time inside events?  E.g. how can one express in the
> query something like: "time is between 9:30 AM and 10:00 AM"? It is
> possible to work with timers in the query? Basically, I'd like to trigger
> certain actions at a specific time or on a regular basis (every N minutes)
> and I'm wondering how this can be expressed using Siddhi's query language.
>

One trick I have used is I have created an timer stream, and sent event to
that timer stream periodically and I have written the query using that
timer stream to do what I need. We wanted to add timer as an inbuilt
concept, so you just days from Timer(10s) to receive events every 10 secs ,
but not yet added I think.


>
> And my last question for now:
> Is it possible to have nested structures in events, e.g. something like
> this: "select field1.field12[3].field1234 from ..."? It means that an
> event has a field called field1, which in turn has an array sub-field
> called field12, and each element of this array has a field field1234. Is it
> possible? Or does Siddhi assume a flat structure of events, i.e. each event
> can have only fields of basic types?
>

No we do not do nested structure within Siddhi, it assumes flat events.
e.g. XML we take and match to a flat structure.


>
>
> Thanks,
>    Leo
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
============================
Srinath Perera, Ph.D.
   http://people.apache.org/~hemapani/
   http://srinathsview.blogspot.com/

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] A few questions about WSO2 CEP/Siddhi

Reply via email to