Julian Hyde created CALCITE-7028:
------------------------------------

             Summary: Implement SQL pipe syntax 
                 Key: CALCITE-7028
                 URL: https://issues.apache.org/jira/browse/CALCITE-7028
             Project: Calcite
          Issue Type: Bug
            Reporter: Julian Hyde


Implement SQL pipe syntax, which was introduced as part of GoogleSQL, and 
described in a paper "[SQL Has Problems. We Can Fix Them: Pipe Syntax In 
SQL|https://www.vldb.org/pvldb/vol17/p4051-shute.pdf]"; by Jeff Shute and 
others. Other vendors, including 
[Firebolt|https://docs.firebolt.io/sql_reference/commands/queries/pipe.html] 
and 
[Databricks|https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-qry-pipeline],
 have announced support.

The paper gives as an example a query in conventional syntax
{code:java}
SELECT c_count, COUNT(*) AS custdist
FROM
  ( SELECT c_custkey, COUNT(o_orderkey) c_count
    FROM customer
    LEFT OUTER JOIN orders ON c_custkey = o_custkey
      AND o_comment NOT LIKE '%unusual%packages%'
    GROUP BY c_custkey
  ) AS c_orders
GROUP BY c_count
ORDER BY custdist DESC, c_count DESC;
{code}
and its equivalent with pipe syntax:
{code:java}
FROM customer
|> LEFT OUTER JOIN orders ON c_custkey = o_custkey
             AND o_comment NOT LIKE '%unusual%packages%'
|> AGGREGATE COUNT(o_orderkey) c_count
   GROUP BY c_custkey
|> AGGREGATE COUNT(*) AS custdist
   GROUP BY c_count
|> ORDER BY custdist DESC, c_count DESC;
{code}
There are several arguments for supporting pipe syntax in Calcite. First, it 
allows Calcite to better emulate the GoogleSQL (BigQuery), Databricks and 
Firebolt dialects. Second, it is an attractive and intuitive syntax that allows 
users to get queries working with fewer mistakes.

Pipes should be fairly easy to implement. The designers made sure that it was 
basically syntactic sugar: not adding new relational operators, keeping the 
existing expression syntax and type system, and (as far as I can tell) keeping 
the namespace scoping rules (for example in which clauses a table alias remains 
visible).

The changes in Calcite would be:
 * Additions to the parser. Allow a sequence of various clauses (e.g. {{{}|> 
WHERE condition{}}}) to follow a query, and give each of them an AST node. 
Allow a query to begin with {{{}FROM{}}}.
 * Add validation rules to the validator to deduce the type of all expressions 
and to raise errors when a query is not semantically valid. This should be 
fairly straightforward because the
 * In {{{}SqlToRelConverter{}}}, generate the same {{RelNode}} relational 
algebra as the corresponding old-style SQL syntax.
 * I believe that each of the pipe steps has a corresponding method in 
{{{}RelBuilder{}}}. In most but not all cases, {{SqlToRelConverter}} is 
currently calling that method. In those cases, this task must either change 
{{SqlToRelConverter}} to use the {{RelBuilder}} method or justify why not.

Use the ZetaSQL tests as a guideline for what needs to be supported. (Please 
research and add a link to this case.)

Incremental rollout would be desirable. The first iteration could, for example, 
support just leading {{FROM}} and the {{|> WHERE}} clause. But it must include 
tests for parsing and validation and tests that execute queries.

At first, the syntax would be enabled only in particular dialects, and disabled 
in the default Calcite dialect. But I think the changes should be made to the 
core parser (not the babel parser). We may enable pipe syntax by default, later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to