Indentation works well when you omit the other arguments (e.g. ...) but once you mix in the arguments for the nodes (especially if those arguments have their own indentation / structure) then it ends up becoming unreadable I think. I prefer the idea of each node having it's own block, with no indentation, and using indentation purely for argument structure. For example (using YAML), consider the query `SELECT n_nationkey, n_name, r_name FROM nation INNER JOIN region ON n_regionkey = r_regionkey`. Note, we don't have a serialization for datasets so I'm using substrait serialization for reads.
``` project: expressions: - "!0" - "!1" - "!2" names: - "n_nationkey" - "n_name" - "r_name" join: left_keys: - "!2" right_keys: - "!4" type: JOIN_TYPE_INNER read: base_schema: names: - "r_regionkey" - "r_name" - "r_comment" struct: types: - i32? - string? - string? named_table: names: - "region" read: base_schema: names: - "n_nationkey" - "n_name" - "n_regionkey" - "n_comment" struct: types: - i32? - string? - i32? - string? named_table: names: - "nation" ``` I feel the above is pretty reasonable once you get past the learning curve of prefix processing to build the tree. It's not clear that node-level indentation adds much. ``` project: expressions: - "!0" - "!1" - "!2" names: - "n_nationkey" - "n_name" - "r_name" join: left_keys: - "!2" right_keys: - "!4" type: JOIN_TYPE_INNER read: base_schema: names: - "r_regionkey" - "r_name" - "r_comment" struct: types: - i32? - string? - string? named_table: names: - "region" read: base_schema: names: - "n_nationkey" - "n_name" - "n_regionkey" - "n_comment" struct: types: - i32? - string? - i32? - string? named_table: names: - "nation" ``` And then I think adding parentheses doesn't make sense. I suppose you could change from YAML to something like pythons or JS's formats for array and dict literals but I think it would be quite messy. On Thu, Nov 3, 2022 at 11:07 AM Percy Camilo Triveño Aucahuasi <percy.camilo...@gmail.com> wrote: > > Thanks Sasha! > > A nice advantage about parentheses is that most editors can track and > highlight the sections between them. > Also, those parentheses can be optional when we detect new lines (in the > case some users don't want to deal with many parentheses); in that case, we > would just need to ask indentation. > > Percy > > > On Thu, Nov 3, 2022 at 12:47 PM Sasha Krassovsky <krassovskysa...@gmail.com> > wrote: > > > Hi Percy, > > Thanks for the input! New lines would be no problem at all, they’d just be > > treated the same as any other whitespace. One thing to point out about the > > function call style when written that way is that it looks a lot like the > > list style, it’s just that there are more parentheses to keep track of, > > though it does make it more obvious what delineates a subtree. > > > > Sasha > > > > > > > 3 нояб. 2022 г., в 10:35, Percy Camilo Triveño Aucahuasi < > > percy.camilo...@gmail.com> написал(а): > > > > > > Hi Sasha, > > > > > > I like the function call-style variant. Quick question about the parser: > > > Do you think we can parse with new lines too? that way it would be even > > > more similar to a json-like/declarative approach and could mitigate a bit > > > the nesting issue (which would make it easier to read as well) for > > instance: > > > > > > sink( > > > project( > > > filter( > > > source( > > > …) > > > …) > > > …) > > > …) > > > > > > Percy > > > > > > > > >> On Tue, Oct 18, 2022 at 5:54 PM Sasha Krassovsky < > > krassovskysa...@gmail.com> > > >> wrote: > > >> > > >> Hi everyone, > > >> We recently had some discussions about parsing expressions. I currently > > >> have a PR [1] up for that taking into account the feedback. Next I > > wanted > > >> to tackle something for ExecPlans, as manually specifying one using > > code is > > >> currently cumbersome. I’m currently deciding between 2 variants: > > >> > > >> - Function call-style: This would be a similar syntax to the > > expressions, > > >> where we would have something along the lines of > > >> `sink(project(filter(source(…)…)…)…)`. The problem with this syntax is > > that > > >> it involves tons of nesting, which although an improvement over > > handwriting > > >> the C++ code, is still cumbersome to write. On the other hand, this > > syntax > > >> is pretty intuitive and meshes well with the expression syntax. A minor > > >> modification could be to make the last argument rather than the first be > > >> the input to a node, which would at least keep a node’s parameters > > >> together. > > >> > > >> - List style: This syntax completely eliminates nesting and would > > probably > > >> be easier to write but has a steeper learning curve. Essentially, since > > we > > >> know how many inputs each type of node takes, we can implicitly > > reconstruct > > >> a tree from a list of node names (formally, we are converting from/to a > > >> pre-order traversal of the query tree). For example, it would look > > >> something like: > > >> > > >> ``` > > >> sink > > >> project <list of names/expressions> > > >> filter <expression> > > >> source > > >> ``` > > >> > > >> The key is that we know that a source takes no inputs, and so source > > nodes > > >> are leaf nodes. To take an example with a join, it could be something > > like > > >> > > >> ``` > > >> order_by_sink <sort key> > > >> hash_join <join arguments> > > >> filter <expression> > > >> source > > >> filter <expression> > > >> source > > >> ``` > > >> > > >> Since we know that a join always takes two arguments, we interpret the > > >> first (filter source) slice as the first argument and the second as the > > >> second argument. It should be noted that the current C++ code already > > >> resembles this kind of syntax, it just has much more clutter. > > >> > > >> Thanks! > > >> Sasha Krassovsky > > >> > > >> [1] https://github.com/apache/arrow/pull/14287 > >