Hi everyone, We recently had some discussions about parsing expressions. I currently have a PR [1] up for that taking into account the feedback. Next I wanted to tackle something for ExecPlans, as manually specifying one using code is currently cumbersome. I’m currently deciding between 2 variants:
- Function call-style: This would be a similar syntax to the expressions, where we would have something along the lines of `sink(project(filter(source(…)…)…)…)`. The problem with this syntax is that it involves tons of nesting, which although an improvement over handwriting the C++ code, is still cumbersome to write. On the other hand, this syntax is pretty intuitive and meshes well with the expression syntax. A minor modification could be to make the last argument rather than the first be the input to a node, which would at least keep a node’s parameters together. - List style: This syntax completely eliminates nesting and would probably be easier to write but has a steeper learning curve. Essentially, since we know how many inputs each type of node takes, we can implicitly reconstruct a tree from a list of node names (formally, we are converting from/to a pre-order traversal of the query tree). For example, it would look something like: ``` sink project <list of names/expressions> filter <expression> source ``` The key is that we know that a source takes no inputs, and so source nodes are leaf nodes. To take an example with a join, it could be something like ``` order_by_sink <sort key> hash_join <join arguments> filter <expression> source filter <expression> source ``` Since we know that a join always takes two arguments, we interpret the first (filter source) slice as the first argument and the second as the second argument. It should be noted that the current C++ code already resembles this kind of syntax, it just has much more clutter. Thanks! Sasha Krassovsky [1] https://github.com/apache/arrow/pull/14287