Hi Julian F,

I admit that I didn't really get your example but talking about 'batch
request optimization'
and 'collapsing "overlapping" but not equal requests' I get the impression
that the problem
is optimizing sets of queries which may have common sub-expressions;
the problem is usually referred to as multi-query optimization and is
indeed relevant with
the Spool operator mentioned by Julian H.

If that's the case then the most relevant work that I can think of is [1],
which solves the problem
by slightly modifying the search strategy of the Volcano planner.

Best,
Stamatis

[1] Roy, Prasan, et al. "Efficient and extensible algorithms for multi
query optimization." ACM SIGMOD Record. Vol. 29. No. 2. ACM, 2000. (
https://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/mqo-sigmod00.pdf)


On Tue, Aug 20, 2019 at 12:49 PM Julian Feinauer <
[email protected]> wrote:

> Hi Julian,
>
> thanks for the reply.
> I have to think about that, I think.
>
> But as I understand the Spool Operator this is to factor out multiple
> calculations of the same issue.
> In our Situation we aim more on collapsing "overlapping" but not equal
> requests.
>
> Consider 8 bits which form physically a byte.
> If I read 8 BOOLEANs I have 8 different request which mask one bit, return
> it (padded) as byte. So 8 requests and 8 bytes data transfer (plus masking
> on the PLC).
> If I would optimize it to read the byte in one request and do the masking
> afterwards I would have one request and only 1 byte transferred (plus no
> masking on the PLC which keeps pressure low there).
>
> This could be modelled by introducing respective "RelNodes" and Planner
> Rules, I think but I do not fully understand how Spool fits in here?
>
> Julian
>
> Am 19.08.19, 20:42 schrieb "Julian Hyde" <[email protected]>:
>
>     One tricky aspect is to optimize a *batch* of requests.
>
>     The trick is to tie together the batch so that it is costed as one
> request. We don’t have an operator specifically for that, but you could for
> instance use UNION ALL. E.g. given Q1 and Q2, you could generate a plan for
>
>       select count(*) from Q1 union all select count(*) from Q2
>
>     If the plan for the batch is be a DAG (i.e. sharing work between the
> components of the batch by creating something akin to “temporary tables”)
> then you are in the territory for which we created the Spool operator (see
> discussion in https://issues.apache.org/jira/browse/CALCITE-481 <
> https://issues.apache.org/jira/browse/CALCITE-481>).
>
>     Julian
>
>
>     > On Aug 19, 2019, at 6:34 AM, Julian Feinauer <
> [email protected]> wrote:
>     >
>     > Hi Danny,
>     >
>     > thanks for the quick reply.
>     > Cost calculation we can of course provide (but it could be a bit
> different as we have not only CPU and Memory but also Network or something).
>     >
>     > And also something like the RelNodes could be provided. In our case
> this would be "Requests" which are at first "Logical" and are then
> transformed to "Physical" Requests. For example the API allows you to
> request many fields per single request but some PLCs only allow one field
> per request. So this would be one task of this layer.
>     >
>     > Julian
>     >
>     > Am 19.08.19, 14:44 schrieb "Danny Chan" <[email protected]>:
>     >
>     >    Cool idea ! Julian Feinauer ~
>     >
>     >    I think the volcano model can be used the base of the cost
> algorithm. As long as you define all the metadata that you care about.
> Another thing is that you should have a struct like RelNode and a method
> like #computeSelfCost.
>     >
>     >    Best,
>     >    Danny Chan
>     >    在 2019年8月19日 +0800 PM5:20,Julian Feinauer <
> [email protected]>,写道:
>     >> Hi folks,
>     >>
>     >> I’m here again with another PLC4X related question (
> https://plc4x.apache.org).
>     >> As we have more and more usecases we encounter situations where we
> send LOTS of replies to PLCs which one could sometimes optimize.
>     >> This has multiple reasons upstream (like multiple different
> Services sending, or you want two logically different addresses which could
> be physically equal).
>     >>
>     >> So, we consider to add some kind of optimizer which takes a Batch
> of requests and tries to arrange them in an “optimal” way with regard to
> som cost function.
>     >> The cost functions would of course be given by each Driver but the
> optimizer could / should be rather general (possibly with pluggable rules).
>     >>
>     >> As Calcites Planner already includes all of that I ask myself if it
> could be possible (and make sense) to use that in PLC4X.
>     >> Generally speaking, this raises the question if the Volcano
> approach can be suitable for such problems.
>     >> The other alternative would be to start with some kind of heuristic
> based planning or with other optimization algorithms (genetic algs, cross
> entropy,…).
>     >>
>     >> Any thoughs or feedbacks are welcome!
>     >>
>     >> Julian
>     >
>     >
>
>
>
>

Reply via email to