Hi Julian, thanks for the reply. I have to think about that, I think.
But as I understand the Spool Operator this is to factor out multiple calculations of the same issue. In our Situation we aim more on collapsing "overlapping" but not equal requests. Consider 8 bits which form physically a byte. If I read 8 BOOLEANs I have 8 different request which mask one bit, return it (padded) as byte. So 8 requests and 8 bytes data transfer (plus masking on the PLC). If I would optimize it to read the byte in one request and do the masking afterwards I would have one request and only 1 byte transferred (plus no masking on the PLC which keeps pressure low there). This could be modelled by introducing respective "RelNodes" and Planner Rules, I think but I do not fully understand how Spool fits in here? Julian Am 19.08.19, 20:42 schrieb "Julian Hyde" <[email protected]>: One tricky aspect is to optimize a *batch* of requests. The trick is to tie together the batch so that it is costed as one request. We don’t have an operator specifically for that, but you could for instance use UNION ALL. E.g. given Q1 and Q2, you could generate a plan for select count(*) from Q1 union all select count(*) from Q2 If the plan for the batch is be a DAG (i.e. sharing work between the components of the batch by creating something akin to “temporary tables”) then you are in the territory for which we created the Spool operator (see discussion in https://issues.apache.org/jira/browse/CALCITE-481 <https://issues.apache.org/jira/browse/CALCITE-481>). Julian > On Aug 19, 2019, at 6:34 AM, Julian Feinauer <[email protected]> wrote: > > Hi Danny, > > thanks for the quick reply. > Cost calculation we can of course provide (but it could be a bit different as we have not only CPU and Memory but also Network or something). > > And also something like the RelNodes could be provided. In our case this would be "Requests" which are at first "Logical" and are then transformed to "Physical" Requests. For example the API allows you to request many fields per single request but some PLCs only allow one field per request. So this would be one task of this layer. > > Julian > > Am 19.08.19, 14:44 schrieb "Danny Chan" <[email protected]>: > > Cool idea ! Julian Feinauer ~ > > I think the volcano model can be used the base of the cost algorithm. As long as you define all the metadata that you care about. Another thing is that you should have a struct like RelNode and a method like #computeSelfCost. > > Best, > Danny Chan > 在 2019年8月19日 +0800 PM5:20,Julian Feinauer <[email protected]>,写道: >> Hi folks, >> >> I’m here again with another PLC4X related question (https://plc4x.apache.org). >> As we have more and more usecases we encounter situations where we send LOTS of replies to PLCs which one could sometimes optimize. >> This has multiple reasons upstream (like multiple different Services sending, or you want two logically different addresses which could be physically equal). >> >> So, we consider to add some kind of optimizer which takes a Batch of requests and tries to arrange them in an “optimal” way with regard to som cost function. >> The cost functions would of course be given by each Driver but the optimizer could / should be rather general (possibly with pluggable rules). >> >> As Calcites Planner already includes all of that I ask myself if it could be possible (and make sense) to use that in PLC4X. >> Generally speaking, this raises the question if the Volcano approach can be suitable for such problems. >> The other alternative would be to start with some kind of heuristic based planning or with other optimization algorithms (genetic algs, cross entropy,…). >> >> Any thoughs or feedbacks are welcome! >> >> Julian > >
