Hi Sasha, I like the function call-style variant. Quick question about the parser: Do you think we can parse with new lines too? that way it would be even more similar to a json-like/declarative approach and could mitigate a bit the nesting issue (which would make it easier to read as well) for instance:
sink( project( filter( source( …) …) …) …) Percy On Tue, Oct 18, 2022 at 5:54 PM Sasha Krassovsky <krassovskysa...@gmail.com> wrote: > Hi everyone, > We recently had some discussions about parsing expressions. I currently > have a PR [1] up for that taking into account the feedback. Next I wanted > to tackle something for ExecPlans, as manually specifying one using code is > currently cumbersome. I’m currently deciding between 2 variants: > > - Function call-style: This would be a similar syntax to the expressions, > where we would have something along the lines of > `sink(project(filter(source(…)…)…)…)`. The problem with this syntax is that > it involves tons of nesting, which although an improvement over handwriting > the C++ code, is still cumbersome to write. On the other hand, this syntax > is pretty intuitive and meshes well with the expression syntax. A minor > modification could be to make the last argument rather than the first be > the input to a node, which would at least keep a node’s parameters > together. > > - List style: This syntax completely eliminates nesting and would probably > be easier to write but has a steeper learning curve. Essentially, since we > know how many inputs each type of node takes, we can implicitly reconstruct > a tree from a list of node names (formally, we are converting from/to a > pre-order traversal of the query tree). For example, it would look > something like: > > ``` > sink > project <list of names/expressions> > filter <expression> > source > ``` > > The key is that we know that a source takes no inputs, and so source nodes > are leaf nodes. To take an example with a join, it could be something like > > ``` > order_by_sink <sort key> > hash_join <join arguments> > filter <expression> > source > filter <expression> > source > ``` > > Since we know that a join always takes two arguments, we interpret the > first (filter source) slice as the first argument and the second as the > second argument. It should be noted that the current C++ code already > resembles this kind of syntax, it just has much more clutter. > > Thanks! > Sasha Krassovsky > > [1] https://github.com/apache/arrow/pull/14287