Julian Hyde created CALCITE-7028: ------------------------------------ Summary: Implement SQL pipe syntax Key: CALCITE-7028 URL: https://issues.apache.org/jira/browse/CALCITE-7028 Project: Calcite Issue Type: Bug Reporter: Julian Hyde
Implement SQL pipe syntax, which was introduced as part of GoogleSQL, and described in a paper "[SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL|https://www.vldb.org/pvldb/vol17/p4051-shute.pdf]" by Jeff Shute and others. Other vendors, including [Firebolt|https://docs.firebolt.io/sql_reference/commands/queries/pipe.html] and [Databricks|https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-qry-pipeline], have announced support. The paper gives as an example a query in conventional syntax {code:java} SELECT c_count, COUNT(*) AS custdist FROM ( SELECT c_custkey, COUNT(o_orderkey) c_count FROM customer LEFT OUTER JOIN orders ON c_custkey = o_custkey AND o_comment NOT LIKE '%unusual%packages%' GROUP BY c_custkey ) AS c_orders GROUP BY c_count ORDER BY custdist DESC, c_count DESC; {code} and its equivalent with pipe syntax: {code:java} FROM customer |> LEFT OUTER JOIN orders ON c_custkey = o_custkey AND o_comment NOT LIKE '%unusual%packages%' |> AGGREGATE COUNT(o_orderkey) c_count GROUP BY c_custkey |> AGGREGATE COUNT(*) AS custdist GROUP BY c_count |> ORDER BY custdist DESC, c_count DESC; {code} There are several arguments for supporting pipe syntax in Calcite. First, it allows Calcite to better emulate the GoogleSQL (BigQuery), Databricks and Firebolt dialects. Second, it is an attractive and intuitive syntax that allows users to get queries working with fewer mistakes. Pipes should be fairly easy to implement. The designers made sure that it was basically syntactic sugar: not adding new relational operators, keeping the existing expression syntax and type system, and (as far as I can tell) keeping the namespace scoping rules (for example in which clauses a table alias remains visible). The changes in Calcite would be: * Additions to the parser. Allow a sequence of various clauses (e.g. {{{}|> WHERE condition{}}}) to follow a query, and give each of them an AST node. Allow a query to begin with {{{}FROM{}}}. * Add validation rules to the validator to deduce the type of all expressions and to raise errors when a query is not semantically valid. This should be fairly straightforward because the * In {{{}SqlToRelConverter{}}}, generate the same {{RelNode}} relational algebra as the corresponding old-style SQL syntax. * I believe that each of the pipe steps has a corresponding method in {{{}RelBuilder{}}}. In most but not all cases, {{SqlToRelConverter}} is currently calling that method. In those cases, this task must either change {{SqlToRelConverter}} to use the {{RelBuilder}} method or justify why not. Use the ZetaSQL tests as a guideline for what needs to be supported. (Please research and add a link to this case.) Incremental rollout would be desirable. The first iteration could, for example, support just leading {{FROM}} and the {{|> WHERE}} clause. But it must include tests for parsing and validation and tests that execute queries. At first, the syntax would be enabled only in particular dialects, and disabled in the default Calcite dialect. But I think the changes should be made to the core parser (not the babel parser). We may enable pipe syntax by default, later. -- This message was sent by Atlassian Jira (v8.20.10#820010)