Hi Julian F, I admit that I didn't really get your example but talking about 'batch request optimization' and 'collapsing "overlapping" but not equal requests' I get the impression that the problem is optimizing sets of queries which may have common sub-expressions; the problem is usually referred to as multi-query optimization and is indeed relevant with the Spool operator mentioned by Julian H.
If that's the case then the most relevant work that I can think of is [1], which solves the problem by slightly modifying the search strategy of the Volcano planner. Best, Stamatis [1] Roy, Prasan, et al. "Efficient and extensible algorithms for multi query optimization." ACM SIGMOD Record. Vol. 29. No. 2. ACM, 2000. ( https://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/mqo-sigmod00.pdf) On Tue, Aug 20, 2019 at 12:49 PM Julian Feinauer < [email protected]> wrote: > Hi Julian, > > thanks for the reply. > I have to think about that, I think. > > But as I understand the Spool Operator this is to factor out multiple > calculations of the same issue. > In our Situation we aim more on collapsing "overlapping" but not equal > requests. > > Consider 8 bits which form physically a byte. > If I read 8 BOOLEANs I have 8 different request which mask one bit, return > it (padded) as byte. So 8 requests and 8 bytes data transfer (plus masking > on the PLC). > If I would optimize it to read the byte in one request and do the masking > afterwards I would have one request and only 1 byte transferred (plus no > masking on the PLC which keeps pressure low there). > > This could be modelled by introducing respective "RelNodes" and Planner > Rules, I think but I do not fully understand how Spool fits in here? > > Julian > > Am 19.08.19, 20:42 schrieb "Julian Hyde" <[email protected]>: > > One tricky aspect is to optimize a *batch* of requests. > > The trick is to tie together the batch so that it is costed as one > request. We don’t have an operator specifically for that, but you could for > instance use UNION ALL. E.g. given Q1 and Q2, you could generate a plan for > > select count(*) from Q1 union all select count(*) from Q2 > > If the plan for the batch is be a DAG (i.e. sharing work between the > components of the batch by creating something akin to “temporary tables”) > then you are in the territory for which we created the Spool operator (see > discussion in https://issues.apache.org/jira/browse/CALCITE-481 < > https://issues.apache.org/jira/browse/CALCITE-481>). > > Julian > > > > On Aug 19, 2019, at 6:34 AM, Julian Feinauer < > [email protected]> wrote: > > > > Hi Danny, > > > > thanks for the quick reply. > > Cost calculation we can of course provide (but it could be a bit > different as we have not only CPU and Memory but also Network or something). > > > > And also something like the RelNodes could be provided. In our case > this would be "Requests" which are at first "Logical" and are then > transformed to "Physical" Requests. For example the API allows you to > request many fields per single request but some PLCs only allow one field > per request. So this would be one task of this layer. > > > > Julian > > > > Am 19.08.19, 14:44 schrieb "Danny Chan" <[email protected]>: > > > > Cool idea ! Julian Feinauer ~ > > > > I think the volcano model can be used the base of the cost > algorithm. As long as you define all the metadata that you care about. > Another thing is that you should have a struct like RelNode and a method > like #computeSelfCost. > > > > Best, > > Danny Chan > > 在 2019年8月19日 +0800 PM5:20,Julian Feinauer < > [email protected]>,写道: > >> Hi folks, > >> > >> I’m here again with another PLC4X related question ( > https://plc4x.apache.org). > >> As we have more and more usecases we encounter situations where we > send LOTS of replies to PLCs which one could sometimes optimize. > >> This has multiple reasons upstream (like multiple different > Services sending, or you want two logically different addresses which could > be physically equal). > >> > >> So, we consider to add some kind of optimizer which takes a Batch > of requests and tries to arrange them in an “optimal” way with regard to > som cost function. > >> The cost functions would of course be given by each Driver but the > optimizer could / should be rather general (possibly with pluggable rules). > >> > >> As Calcites Planner already includes all of that I ask myself if it > could be possible (and make sense) to use that in PLC4X. > >> Generally speaking, this raises the question if the Volcano > approach can be suitable for such problems. > >> The other alternative would be to start with some kind of heuristic > based planning or with other optimization algorithms (genetic algs, cross > entropy,…). > >> > >> Any thoughs or feedbacks are welcome! > >> > >> Julian > > > > > > > >
