Hi Sasha,

I like the function call-style variant.  Quick question about the parser:
Do you think we can parse with new lines too? that way it would be even
more similar to a json-like/declarative approach and could mitigate a bit
the nesting issue (which would make it easier to read as well) for instance:

sink(
  project(
    filter(
      source(
        …)
    …)
  …)
…)

Percy


On Tue, Oct 18, 2022 at 5:54 PM Sasha Krassovsky <krassovskysa...@gmail.com>
wrote:

> Hi everyone,
> We recently had some discussions about parsing expressions. I currently
> have a PR [1] up for that taking into account the feedback. Next I wanted
> to tackle something for ExecPlans, as manually specifying one using code is
> currently cumbersome. I’m currently deciding between 2 variants:
>
> - Function call-style: This would be a similar syntax to the expressions,
> where we would have something along the lines of
> `sink(project(filter(source(…)…)…)…)`. The problem with this syntax is that
> it involves tons of nesting, which although an improvement over handwriting
> the C++ code, is still cumbersome to write. On the other hand, this syntax
> is pretty intuitive and meshes well with the expression syntax. A minor
> modification could be to make the last argument rather than the first be
> the input to a node, which would at least keep a node’s parameters
> together.
>
> - List style: This syntax completely eliminates nesting and would probably
> be easier to write but has a steeper learning curve. Essentially, since we
> know how many inputs each type of node takes, we can implicitly reconstruct
> a tree from a list of node names (formally, we are converting from/to a
> pre-order traversal of the query tree). For example, it would look
> something like:
>
> ```
> sink
> project <list of names/expressions>
> filter <expression>
> source
> ```
>
> The key is that we know that a source takes no inputs, and so source nodes
> are leaf nodes. To take an example with a join, it could be something like
>
> ```
> order_by_sink <sort key>
> hash_join <join arguments>
> filter <expression>
> source
> filter <expression>
> source
> ```
>
> Since we know that a join always takes two arguments, we interpret the
> first (filter source) slice as the first argument and the second as the
> second argument. It should be noted that the current C++ code already
> resembles this kind of syntax, it just has much more clutter.
>
> Thanks!
> Sasha Krassovsky
>
> [1] https://github.com/apache/arrow/pull/14287

Reply via email to