Re: [Architecture] A few questions about WSO2 CEP/Siddhi

Leo Romanoff Mon, 10 Mar 2014 03:24:29 -0700

Hi all,

First of all, thank you very much for your explanations and clarifications! It 
is very interesting and useful!


Let me ask a few more questions and provide a few comments.

> Hi All, these questions and answers are very educating. Shall we add them to 
> our doc FAQs? 

I think it would be a very good idea to add something like this to the FAQs or 
to create some sort of an "architecture and implementation overview" document.

1) How many rules/queries can be defined in one engine. How does it affect 
performance?
>
>   For example, can I define (tens of) thousands of queries using the same (or 
>multiple) instance of SiddhiManager? Would it make processing much slower? Or 
>is the speed not proportional to the number of queries? E.g. when a new event 
>arrives, does Siddhi test it in a linear fashion against each query or does 
>Siddhi keep an internal state machine that tries to match an event against all 
>rules at once?
>

> SiddhiManager can have many queries, and if you chain the queries in a liner 
> fashion then all those queries will be executed 
> one after the other and you might see some performance degradation, but if 
> you have have then parallel then there wont be 

> any issues.   


Well, before I got this answer, I created a few test-cases to check 
experimentally how it behaves. I created a single instance of a SiddhiManager, 
added 10000 queries that all read from the same input stream, check if a 
specific attribute (namely, price) of an event is inside a given random 
interval ( [ price >= random_low and price <= random_high] ) and output into 
randomly into one of 100 streams. Then I measured the time required to process 
1000000 events using this setup. I also did exactly the same experiment with 
Esper.

My findings were that Siddhi is much slower than Esper in this setup. After 
looking into the internal implementations of both, I realized the reason. 
Siddhi processes all queries that read from the same input stream in a linear 
fashion, sequentially. Even if many of the queries have almost the same 
condition, no optimization attempts are done by Siddhi. Esper detects that many 
queries have a condition on the same variable and create some sort of a 
decision tree. As a result, their running time in log N, where as Siddhi needs 
O(n). 

I'm not saying that this test-case if very typical or important, but may be 
Siddhi should try to analyze the complete set of queries and try to apply some 
optimizations, when it is possible? I.e. it is a bit of a global optimization 
applied. It could detect some common sub-expressions or sub-conditions in the 
queries and evaluate them only once, instead of doing it over and over again by 
evaluating each query separately.

After getting these first results, I changed the setup, so that each query uses 
one of many input streams (e.g. one of 300) instead of using the same one. This 
greatly improved the situation, because now the number of queries per input 
stream was much smaller and thus processing was way faster. But even in this 
setup it is still about 5-6 times slower than Esper in this situation.



>2) Is it possible to easily disable/enable some queries?
>
>In my use-cases I have a lot of queries. Actually, I have a lot of tenants and 
>each tenant may have something like 10-100 queries. Rather often (e.g. few 
>times a day), tenants would like to disable/enable some of their queries. What 
>is a proper way to do it? Is it a costly operation, i.e. does Siddhi need to 
>perform a lot of processing to disable or enabled a query?
>Is it better to keep a dedicated SiddhiManager instance per tenant or is it OK 
>to have one SiddhiManager instance which handles all those tenants with all 
>their queries?
>
>
> The general norm is, you have to use a SiddhiManager per scenario, where each 
> scenario might contain one or more queries, 
> with this modal its easy if any tenant want to add a remove a scenario and it 
> will not affect other queries and tenants.

If I have tens of thousands of tenants, then having a dedicated SiddhiManager 
per tenant is probably not very practical or even possible, as it will get 
pretty heave weight, I guess.  

Therefore, having the ability to enable/disable to query could be very 
practical. In fact, it could be probably implemented very easily. Imagine that 
each query object has a boolean flag that indicates if it is enabled or not. If 
the condition matches and before Siddhi tries to perform the insert, i.e. the 
action, it could check if the query is disabled. If it is disabled, no action 
(i.e. insert) is performed at all. Of course, there is still some overhead when 
matching the query. But may be even this can be skipped if query is disabled? 
I.e. conditions are immediately evaluated to "false" and thus never trigger?

BTW, Esper has this feature. You can disable/enable any query without removing  
and later adding it again.

When it comes to Siddhi persistent stores, you write:
>It only stores the state information of the processing, E.g the current 
>running Avg of the average calculation. This will be used >when server 
>recovers from a failure. 


OK. I understand what it does now. BTW, does it also store any sliding windows 
as well so that failover may happen?

My further question is: How to support more dynamic scenarios, where the set of 
queries is not totally static? What if the set of rules changes a few times per 
hour/day/etc? May be it would also make sense to persist a set of queries that 
were deployed on a given SiddhiManager? This way a user doesn't need to perform 
any custom book-keeping for the set of queries. 

Yet another question about Siddhi:
Is it possible to express queries that work with absolute time or timers 
without providing a time inside events?  E.g. how can one express in the query 
something like: "time is between 9:30 AM and 10:00 AM"? It is possible to work 
with timers in the query? Basically, I'd like to trigger certain actions at a 
specific time or on a regular basis (every N minutes) and I'm wondering how 
this can be expressed using Siddhi's query language.

And my last question for now:
Is it possible to have nested structures in events, e.g. something like this: 
"select field1.field12[3].field1234 from ..."? It means that an event has a 
field called field1, which in turn has an array sub-field called field12, and 
each element of this array has a field field1234. Is it possible? Or does 
Siddhi assume a flat structure of events, i.e. each event can have only fields 
of basic types?


Thanks,
   Leo

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] A few questions about WSO2 CEP/Siddhi

Reply via email to