Hi, 

a short update on this matter.
We had a meetup yesterday with the PLC4X community and I prepared a very first 
"demo" implementation of the Optimizer / Framework.
I tried to organize it after the Calcite / Volcano approach.

We had several discussions and experimented a bit and some things that came to 
my mind why I'm unsure whether the volcano framework really fits best here, or 
some other approach.

* Usually we have a pretty big set of "Operators" (in our case field requests 
in comparison to Calcites RelNodes):
In regular cases they could be 10 but also quite often up to 100 (which is 
rather rare for 'sane' queries, I assume)
* We have very few rules:
In fact, we may have two or three rules (protocol specific), but usually of the 
form 'merge two field requests into one' or 'split one field request into two'
* We have no tree structure, but everything is 'flat'

With the above setup its pretty obvious that we cannot profit from Volcanos 
dynamic programming approach. Furthermore, with the simple approach of applying 
all suitable rules to all possible candidates the state space explodes with 
O(n!) (where n could be large, see above).

So I think our best bet atm would be to exploit all possible spaces (but with 
excessive pruning) or use some other sort of algorithms like simple gradient 
descent (I think our convergence behavior should be quite nice if the cost 
function is "smooth" enough) or stochastic optimization like cross entropy or 
simulated annealing.

I just wanted to give some feedback back to the list as many people joined the 
discussion and had interesting ideas. And I still think that there is an 
overlap in whats done but the 'common core' is smaller than I initially assumed 
(e.g., some query optimizers AFAIR use approaches like simulated annealing).

Best
Julian

Am 20.08.19, 15:38 schrieb "Julian Feinauer" <[email protected]>:

    Hi Stamatis,
    
    thanks for your response.
    I think my brain just needs a bit more time to get really deep into those 
advanced planning topics (its sometimes slow on adopting...).
    But I will look through it.
    We have a meetup this week and will discuss the matter and how to setup 
everything to enable some optimization at first (introducing cost estimates and 
such) and then I will again have a deeper look and perhaps prepare a test case 
or a runnable test.
    Then its probably the easiest to reason about.
    
    Julian
    
    Am 20.08.19, 15:19 schrieb "Stamatis Zampetakis" <[email protected]>:
    
        Hi Julian F,
        
        I admit that I didn't really get your example but talking about 'batch
        request optimization'
        and 'collapsing "overlapping" but not equal requests' I get the 
impression
        that the problem
        is optimizing sets of queries which may have common sub-expressions;
        the problem is usually referred to as multi-query optimization and is
        indeed relevant with
        the Spool operator mentioned by Julian H.
        
        If that's the case then the most relevant work that I can think of is 
[1],
        which solves the problem
        by slightly modifying the search strategy of the Volcano planner.
        
        Best,
        Stamatis
        
        [1] Roy, Prasan, et al. "Efficient and extensible algorithms for multi
        query optimization." ACM SIGMOD Record. Vol. 29. No. 2. ACM, 2000. (
        https://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/mqo-sigmod00.pdf)
        
        
        On Tue, Aug 20, 2019 at 12:49 PM Julian Feinauer <
        [email protected]> wrote:
        
        > Hi Julian,
        >
        > thanks for the reply.
        > I have to think about that, I think.
        >
        > But as I understand the Spool Operator this is to factor out multiple
        > calculations of the same issue.
        > In our Situation we aim more on collapsing "overlapping" but not equal
        > requests.
        >
        > Consider 8 bits which form physically a byte.
        > If I read 8 BOOLEANs I have 8 different request which mask one bit, 
return
        > it (padded) as byte. So 8 requests and 8 bytes data transfer (plus 
masking
        > on the PLC).
        > If I would optimize it to read the byte in one request and do the 
masking
        > afterwards I would have one request and only 1 byte transferred (plus 
no
        > masking on the PLC which keeps pressure low there).
        >
        > This could be modelled by introducing respective "RelNodes" and 
Planner
        > Rules, I think but I do not fully understand how Spool fits in here?
        >
        > Julian
        >
        > Am 19.08.19, 20:42 schrieb "Julian Hyde" <[email protected]>:
        >
        >     One tricky aspect is to optimize a *batch* of requests.
        >
        >     The trick is to tie together the batch so that it is costed as one
        > request. We don’t have an operator specifically for that, but you 
could for
        > instance use UNION ALL. E.g. given Q1 and Q2, you could generate a 
plan for
        >
        >       select count(*) from Q1 union all select count(*) from Q2
        >
        >     If the plan for the batch is be a DAG (i.e. sharing work between 
the
        > components of the batch by creating something akin to “temporary 
tables”)
        > then you are in the territory for which we created the Spool operator 
(see
        > discussion in https://issues.apache.org/jira/browse/CALCITE-481 <
        > https://issues.apache.org/jira/browse/CALCITE-481>).
        >
        >     Julian
        >
        >
        >     > On Aug 19, 2019, at 6:34 AM, Julian Feinauer <
        > [email protected]> wrote:
        >     >
        >     > Hi Danny,
        >     >
        >     > thanks for the quick reply.
        >     > Cost calculation we can of course provide (but it could be a bit
        > different as we have not only CPU and Memory but also Network or 
something).
        >     >
        >     > And also something like the RelNodes could be provided. In our 
case
        > this would be "Requests" which are at first "Logical" and are then
        > transformed to "Physical" Requests. For example the API allows you to
        > request many fields per single request but some PLCs only allow one 
field
        > per request. So this would be one task of this layer.
        >     >
        >     > Julian
        >     >
        >     > Am 19.08.19, 14:44 schrieb "Danny Chan" <[email protected]>:
        >     >
        >     >    Cool idea ! Julian Feinauer ~
        >     >
        >     >    I think the volcano model can be used the base of the cost
        > algorithm. As long as you define all the metadata that you care about.
        > Another thing is that you should have a struct like RelNode and a 
method
        > like #computeSelfCost.
        >     >
        >     >    Best,
        >     >    Danny Chan
        >     >    在 2019年8月19日 +0800 PM5:20,Julian Feinauer <
        > [email protected]>,写道:
        >     >> Hi folks,
        >     >>
        >     >> I’m here again with another PLC4X related question (
        > https://plc4x.apache.org).
        >     >> As we have more and more usecases we encounter situations 
where we
        > send LOTS of replies to PLCs which one could sometimes optimize.
        >     >> This has multiple reasons upstream (like multiple different
        > Services sending, or you want two logically different addresses which 
could
        > be physically equal).
        >     >>
        >     >> So, we consider to add some kind of optimizer which takes a 
Batch
        > of requests and tries to arrange them in an “optimal” way with regard 
to
        > som cost function.
        >     >> The cost functions would of course be given by each Driver but 
the
        > optimizer could / should be rather general (possibly with pluggable 
rules).
        >     >>
        >     >> As Calcites Planner already includes all of that I ask myself 
if it
        > could be possible (and make sense) to use that in PLC4X.
        >     >> Generally speaking, this raises the question if the Volcano
        > approach can be suitable for such problems.
        >     >> The other alternative would be to start with some kind of 
heuristic
        > based planning or with other optimization algorithms (genetic algs, 
cross
        > entropy,…).
        >     >>
        >     >> Any thoughs or feedbacks are welcome!
        >     >>
        >     >> Julian
        >     >
        >     >
        >
        >
        >
        >
        
    
    

Reply via email to