Hi everyone,
We recently had some discussions about parsing expressions. I currently have a 
PR [1] up for that taking into account the feedback. Next I wanted to tackle 
something for ExecPlans, as manually specifying one using code is currently 
cumbersome. I’m currently deciding between 2 variants:

- Function call-style: This would be a similar syntax to the expressions, where 
we would have something along the lines of 
`sink(project(filter(source(…)…)…)…)`. The problem with this syntax is that it 
involves tons of nesting, which although an improvement over handwriting the 
C++ code, is still cumbersome to write. On the other hand, this syntax is 
pretty intuitive and meshes well with the expression syntax. A minor 
modification could be to make the last argument rather than the first be the 
input to a node, which would at least keep a node’s parameters together. 

- List style: This syntax completely eliminates nesting and would probably be 
easier to write but has a steeper learning curve. Essentially, since we know 
how many inputs each type of node takes, we can implicitly reconstruct a tree 
from a list of node names (formally, we are converting from/to a pre-order 
traversal of the query tree). For example, it would look something like:

```
sink
project <list of names/expressions>
filter <expression>
source
```

The key is that we know that a source takes no inputs, and so source nodes are 
leaf nodes. To take an example with a join, it could be something like

```
order_by_sink <sort key>
hash_join <join arguments>
filter <expression>
source
filter <expression>
source
```

Since we know that a join always takes two arguments, we interpret the first 
(filter source) slice as the first argument and the second as the second 
argument. It should be noted that the current C++ code already resembles this 
kind of syntax, it just has much more clutter.

Thanks!
Sasha Krassovsky

[1] https://github.com/apache/arrow/pull/14287

Reply via email to