Hi Julian, Thanks for the feedback. I was going to reuse the syntax we use for literals in the expression parser PR (of course still subject to change), it’s of the form $datatype:value. So short/long/float would all be distinguished by writing $int16:0, $int64:0, and $float32:0.
I feel it strikes a good balance of being very precise but not overly verbose (I see substrait’s JSON does it like “literal” : { “<datatype>” : <value> } }, which conveys the same information but is much more verbose. Sasha > 3 нояб. 2022 г., в 16:26, Julian Hyde <jhyde.apa...@gmail.com> написал(а): > > When people design a language to represent a data structure, they often do a > poor job with literals (i.e. the constant values for each data type). And > that causes problems with operator overloading. I recommend that you give > each data type its own literal format, so you can distinguish, say, a short 0 > from an unsigned long 0 or a 64 bit floating point 0. Same goes for composite > literals (e.g. a constant of type array or struct or array-of-struct) and > floating point numbers. > >> On Nov 3, 2022, at 11:06 AM, Percy Camilo Triveño Aucahuasi >> <percy.camilo...@gmail.com> wrote: >> >> Thanks Sasha! >> >> A nice advantage about parentheses is that most editors can track and >> highlight the sections between them. >> Also, those parentheses can be optional when we detect new lines (in the >> case some users don't want to deal with many parentheses); in that case, we >> would just need to ask indentation. >> >> Percy >> >> >>> On Thu, Nov 3, 2022 at 12:47 PM Sasha Krassovsky <krassovskysa...@gmail.com> >>> wrote: >>> >>> Hi Percy, >>> Thanks for the input! New lines would be no problem at all, they’d just be >>> treated the same as any other whitespace. One thing to point out about the >>> function call style when written that way is that it looks a lot like the >>> list style, it’s just that there are more parentheses to keep track of, >>> though it does make it more obvious what delineates a subtree. >>> >>> Sasha >>> >>> >>>> 3 нояб. 2022 г., в 10:35, Percy Camilo Triveño Aucahuasi < >>> percy.camilo...@gmail.com> написал(а): >>>> >>>> Hi Sasha, >>>> >>>> I like the function call-style variant. Quick question about the parser: >>>> Do you think we can parse with new lines too? that way it would be even >>>> more similar to a json-like/declarative approach and could mitigate a bit >>>> the nesting issue (which would make it easier to read as well) for >>> instance: >>>> >>>> sink( >>>> project( >>>> filter( >>>> source( >>>> …) >>>> …) >>>> …) >>>> …) >>>> >>>> Percy >>>> >>>> >>>>> On Tue, Oct 18, 2022 at 5:54 PM Sasha Krassovsky < >>> krassovskysa...@gmail.com> >>>>> wrote: >>>>> >>>>> Hi everyone, >>>>> We recently had some discussions about parsing expressions. I currently >>>>> have a PR [1] up for that taking into account the feedback. Next I >>> wanted >>>>> to tackle something for ExecPlans, as manually specifying one using >>> code is >>>>> currently cumbersome. I’m currently deciding between 2 variants: >>>>> >>>>> - Function call-style: This would be a similar syntax to the >>> expressions, >>>>> where we would have something along the lines of >>>>> `sink(project(filter(source(…)…)…)…)`. The problem with this syntax is >>> that >>>>> it involves tons of nesting, which although an improvement over >>> handwriting >>>>> the C++ code, is still cumbersome to write. On the other hand, this >>> syntax >>>>> is pretty intuitive and meshes well with the expression syntax. A minor >>>>> modification could be to make the last argument rather than the first be >>>>> the input to a node, which would at least keep a node’s parameters >>>>> together. >>>>> >>>>> - List style: This syntax completely eliminates nesting and would >>> probably >>>>> be easier to write but has a steeper learning curve. Essentially, since >>> we >>>>> know how many inputs each type of node takes, we can implicitly >>> reconstruct >>>>> a tree from a list of node names (formally, we are converting from/to a >>>>> pre-order traversal of the query tree). For example, it would look >>>>> something like: >>>>> >>>>> ``` >>>>> sink >>>>> project <list of names/expressions> >>>>> filter <expression> >>>>> source >>>>> ``` >>>>> >>>>> The key is that we know that a source takes no inputs, and so source >>> nodes >>>>> are leaf nodes. To take an example with a join, it could be something >>> like >>>>> >>>>> ``` >>>>> order_by_sink <sort key> >>>>> hash_join <join arguments> >>>>> filter <expression> >>>>> source >>>>> filter <expression> >>>>> source >>>>> ``` >>>>> >>>>> Since we know that a join always takes two arguments, we interpret the >>>>> first (filter source) slice as the first argument and the second as the >>>>> second argument. It should be noted that the current C++ code already >>>>> resembles this kind of syntax, it just has much more clutter. >>>>> >>>>> Thanks! >>>>> Sasha Krassovsky >>>>> >>>>> [1] https://github.com/apache/arrow/pull/14287 >>> >