Re: [Architecture] A few questions about WSO2 CEP/Siddhi

Sriskandarajah Suhothayan Tue, 25 Mar 2014 06:48:48 -0700

On Thu, Mar 20, 2014 at 5:13 PM, Leo Romanoff <[email protected]> wrote:


>
>
> On Mon, Mar 10, 2014 at 11:19 AM, Leo Romanoff  wrote:
>
>
> >>1) How many rules/queries can be defined in one engine. How does it
> affect performance?
> >>
> >>   For example, can I define (tens of) thousands of queries using the
> same (or multiple) instance of SiddhiManager? Would it make processing much
> slower? Or is the speed not proportional to the number of queries? E.g.
> when a new event arrives, does Siddhi test it in a linear fashion against
> each query or does Siddhi keep an internal state machine that tries to
> match an event against all rules at once?
> >>
> >
> >
> >> SiddhiManager can have many queries, and if you chain the queries in a
> liner fashion then all those queries will be executed
> >> one after the other and you might see some performance degradation, but
> if you have have then parallel then there wont be
> >
> >> any issues.
> >
> >
> >
> >Well, before I got this answer, I created a few test-cases to check
> experimentally how it behaves. I created a single instance of a
> SiddhiManager, added 10000 queries that all read from the same input
> stream, check if a specific attribute (namely, price) of an event is inside
> a given random interval ( [ price >= random_low and price <= random_high] )
> and output into randomly into one of 100 streams. Then I measured the time
> required to process 1000000 events using this setup. I also did exactly the
> same experiment with Esper.
> >
> >
> >My findings were that Siddhi is much slower than Esper in this setup.
> After looking into the internal implementations of both, I realized the
> reason. Siddhi processes all queries that read from the same input stream
> in a linear fashion, sequentially. Even if many of the queries have almost
> the same condition, no optimization attempts are done by Siddhi. Esper
> detects that many queries have a condition on the same variable and create
> some sort of a decision tree. As a result, their running time in log N,
> where as Siddhi needs O(n).
> >
> >
> >I'm not saying that this test-case if very typical or important, but may
> be Siddhi should try to analyze the complete set of queries and try to
> apply some optimizations, when it is possible? I.e. it is a bit of a global
> optimization applied. It could detect some common sub-expressions or
> sub-conditions in the queries and evaluate them only once, instead of doing
> it over and over again by evaluating each query separately.
> >
> >
> >After getting these first results, I changed the setup, so that each
> query uses one of many input streams (e.g. one of 300) instead of using the
> same one. This greatly improved the situation, because now the number of
> queries per input stream was much smaller and thus processing was way
> faster. But even in this setup it is still about 5-6 times slower than
> Esper in this situation.
> >
> >
>
> I'd like to get a bit more specific on this point. For the sake of
> simplicity, let's say I need to model a lot of sensors (e.g. 100000 or
> 1000000). All sensors produce the same events, e.g. SensorEvent(id string,
> value float), where id is the unique id of a sensor.
>
> For some/all of the sensors there are a few queries (e.g. 2-10) that
> analyze events from a single or multiple sensors. Obviously, to be able to
> refer only to events from specific sensors, each such query uses one or
> multiple filters like SensorEvent(id=SensorN) to get only the expected
> events. Now imagine that I have 10000 or even 100000 such queries in total
> (for all my sensors).
>
> The processing using Siddhi gets very slow in this case, because all
> events are put into the same event stream and this event stream has a huge
> number of listeners, i.e. queries reading from it. Currently, Siddhi goes
> over each query in linear fashion and checks it conditions. There are some
> workarounds, as I described above, e.g. allocating one event stream per
> sensor and then pre-filtering events received from sensors and putting them
> into a related event stream. But this quickly gets annoying because the
> whole idea of CEP is to delegate this kind of optimizations/decisions to
> the CEP engine and avoid manual event processing.
>
> I see different alternatives to solve it in a proper way:
>
> - one of the alternatives was described above already. It is pretty
> generic. Siddhi analyzes all queries and figures out that certain
> conditions are (almost) the same. Therefore it can evaluate the condition
> only once (e.g. SensorEvent.id) and then dispatch based on its value. May
> be some sort of a search tree could be used to figure out a set of queries
> with a matching filter (Esper seems to do something like this). I have
> filed an issue for this already.
>
> - yet another alternative that I had in mind was to something very similar
> to "partition by". In principle, "partition by" can already effectively
> split the input stream into partitions. The only problem is that exactly
> the _same_ query(s) is applied to each partition, whereas I need a small,
> partition-specific set of queries to be applied for each partition. It
> feels like it could be possible to extend/adapt "partition by" to achieve
> it or implement something along the lines of "partition by", but I don't
> know Siddhi's implementation to judge if it is feasible at all and how much
> effort it would need.
>
> Questions:
> - Are there more efficient ways to model the "huge number of sensors"
> scenario that I described above with existing Siddhi implementation and
> without doing part of event processing by hand?
>
- What do you think about the "partition by"-like alternative that I
> presented? Does it make sense? Can it be easily implemented?
>
>
Thanks for your interest
Unfortunately based the the current architecture of Siddhi, option #1 was
not easily achievable.
Currently "partition by" is not multi threads, which is in our roadmap.
Moving forward we will be fixing this and you will be able got better
results.
We have delayed  "partition by" implementation because we are focusing to
achieve this with distributable Siddhi.

I think for now you have to do part of the event processing by hand :(

Will update the https://wso2.org/jira/browse/CEP-710 on the progress

Thanks
Suho




>  Thanks,
>    Leo
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>



-- 

*S. Suhothayan*
Associate Technical Lead,
 *WSO2 Inc. *http://wso2.com
* <http://wso2.com/>*
lean . enterprise . middleware


*cell: (+94) 779 756 757 | blog: http://suhothayan.blogspot.com/
<http://suhothayan.blogspot.com/>twitter: http://twitter.com/suhothayan
<http://twitter.com/suhothayan> | linked-in:
http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>*

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] A few questions about WSO2 CEP/Siddhi

Reply via email to