Hi all,
First of all, thank you very much for your explanations and clarifications! It
is very interesting and useful!
Let me ask a few more questions and provide a few comments.
> Hi All, these questions and answers are very educating. Shall we add them to
> our doc FAQs?
I think it would be a very good idea to add something like this to the FAQs or
to create some sort of an "architecture and implementation overview" document.
1) How many rules/queries can be defined in one engine. How does it affect
performance?
>
> For example, can I define (tens of) thousands of queries using the same (or
>multiple) instance of SiddhiManager? Would it make processing much slower? Or
>is the speed not proportional to the number of queries? E.g. when a new event
>arrives, does Siddhi test it in a linear fashion against each query or does
>Siddhi keep an internal state machine that tries to match an event against all
>rules at once?
>
> SiddhiManager can have many queries, and if you chain the queries in a liner
> fashion then all those queries will be executed
> one after the other and you might see some performance degradation, but if
> you have have then parallel then there wont be
> any issues.
Well, before I got this answer, I created a few test-cases to check
experimentally how it behaves. I created a single instance of a SiddhiManager,
added 10000 queries that all read from the same input stream, check if a
specific attribute (namely, price) of an event is inside a given random
interval ( [ price >= random_low and price <= random_high] ) and output into
randomly into one of 100 streams. Then I measured the time required to process
1000000 events using this setup. I also did exactly the same experiment with
Esper.
My findings were that Siddhi is much slower than Esper in this setup. After
looking into the internal implementations of both, I realized the reason.
Siddhi processes all queries that read from the same input stream in a linear
fashion, sequentially. Even if many of the queries have almost the same
condition, no optimization attempts are done by Siddhi. Esper detects that many
queries have a condition on the same variable and create some sort of a
decision tree. As a result, their running time in log N, where as Siddhi needs
O(n).
I'm not saying that this test-case if very typical or important, but may be
Siddhi should try to analyze the complete set of queries and try to apply some
optimizations, when it is possible? I.e. it is a bit of a global optimization
applied. It could detect some common sub-expressions or sub-conditions in the
queries and evaluate them only once, instead of doing it over and over again by
evaluating each query separately.
After getting these first results, I changed the setup, so that each query uses
one of many input streams (e.g. one of 300) instead of using the same one. This
greatly improved the situation, because now the number of queries per input
stream was much smaller and thus processing was way faster. But even in this
setup it is still about 5-6 times slower than Esper in this situation.
>2) Is it possible to easily disable/enable some queries?
>
>In my use-cases I have a lot of queries. Actually, I have a lot of tenants and
>each tenant may have something like 10-100 queries. Rather often (e.g. few
>times a day), tenants would like to disable/enable some of their queries. What
>is a proper way to do it? Is it a costly operation, i.e. does Siddhi need to
>perform a lot of processing to disable or enabled a query?
>Is it better to keep a dedicated SiddhiManager instance per tenant or is it OK
>to have one SiddhiManager instance which handles all those tenants with all
>their queries?
>
>
> The general norm is, you have to use a SiddhiManager per scenario, where each
> scenario might contain one or more queries,
> with this modal its easy if any tenant want to add a remove a scenario and it
> will not affect other queries and tenants.
If I have tens of thousands of tenants, then having a dedicated SiddhiManager
per tenant is probably not very practical or even possible, as it will get
pretty heave weight, I guess.
Therefore, having the ability to enable/disable to query could be very
practical. In fact, it could be probably implemented very easily. Imagine that
each query object has a boolean flag that indicates if it is enabled or not. If
the condition matches and before Siddhi tries to perform the insert, i.e. the
action, it could check if the query is disabled. If it is disabled, no action
(i.e. insert) is performed at all. Of course, there is still some overhead when
matching the query. But may be even this can be skipped if query is disabled?
I.e. conditions are immediately evaluated to "false" and thus never trigger?
BTW, Esper has this feature. You can disable/enable any query without removing
and later adding it again.
When it comes to Siddhi persistent stores, you write:
>It only stores the state information of the processing, E.g the current
>running Avg of the average calculation. This will be used >when server
>recovers from a failure.
OK. I understand what it does now. BTW, does it also store any sliding windows
as well so that failover may happen?
My further question is: How to support more dynamic scenarios, where the set of
queries is not totally static? What if the set of rules changes a few times per
hour/day/etc? May be it would also make sense to persist a set of queries that
were deployed on a given SiddhiManager? This way a user doesn't need to perform
any custom book-keeping for the set of queries.
Yet another question about Siddhi:
Is it possible to express queries that work with absolute time or timers
without providing a time inside events? E.g. how can one express in the query
something like: "time is between 9:30 AM and 10:00 AM"? It is possible to work
with timers in the query? Basically, I'd like to trigger certain actions at a
specific time or on a regular basis (every N minutes) and I'm wondering how
this can be expressed using Siddhi's query language.
And my last question for now:
Is it possible to have nested structures in events, e.g. something like this:
"select field1.field12[3].field1234 from ..."? It means that an event has a
field called field1, which in turn has an array sub-field called field12, and
each element of this array has a field field1234. Is it possible? Or does
Siddhi assume a flat structure of events, i.e. each event can have only fields
of basic types?
Thanks,
Leo
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture