[
https://issues.apache.org/jira/browse/CALCITE-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Caican Cai reassigned CALCITE-7028:
-----------------------------------
Assignee: Caican Cai
> Implement SQL pipe syntax
> --------------------------
>
> Key: CALCITE-7028
> URL: https://issues.apache.org/jira/browse/CALCITE-7028
> Project: Calcite
> Issue Type: Bug
> Reporter: Julian Hyde
> Assignee: Caican Cai
> Priority: Major
>
> Implement SQL pipe syntax, which was introduced as part of GoogleSQL, and
> described in a paper "[SQL Has Problems. We Can Fix Them: Pipe Syntax In
> SQL|https://www.vldb.org/pvldb/vol17/p4051-shute.pdf]" by Jeff Shute and
> others. Other vendors, including
> [Firebolt|https://docs.firebolt.io/sql_reference/commands/queries/pipe.html]
> and
> [Databricks|https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-qry-pipeline],
> have announced support.
> The paper gives as an example a query in conventional syntax
> {code:java}
> SELECT c_count, COUNT(*) AS custdist
> FROM
> ( SELECT c_custkey, COUNT(o_orderkey) c_count
> FROM customer
> LEFT OUTER JOIN orders ON c_custkey = o_custkey
> AND o_comment NOT LIKE '%unusual%packages%'
> GROUP BY c_custkey
> ) AS c_orders
> GROUP BY c_count
> ORDER BY custdist DESC, c_count DESC;
> {code}
> and its equivalent with pipe syntax:
> {code:java}
> FROM customer
> |> LEFT OUTER JOIN orders ON c_custkey = o_custkey
> AND o_comment NOT LIKE '%unusual%packages%'
> |> AGGREGATE COUNT(o_orderkey) c_count
> GROUP BY c_custkey
> |> AGGREGATE COUNT(*) AS custdist
> GROUP BY c_count
> |> ORDER BY custdist DESC, c_count DESC;
> {code}
> There are several arguments for supporting pipe syntax in Calcite. First, it
> allows Calcite to better emulate the GoogleSQL (BigQuery), Databricks and
> Firebolt dialects. Second, it is an attractive and intuitive syntax that
> allows users to get queries working with fewer mistakes.
> Pipes should be fairly easy to implement. The designers made sure that it was
> basically syntactic sugar: not adding new relational operators, keeping the
> existing expression syntax and type system, and (as far as I can tell)
> keeping the namespace scoping rules (for example in which clauses a table
> alias remains visible).
> The changes in Calcite would be:
> * Additions to the parser. Allow a sequence of various clauses (e.g. {{{}|>
> WHERE condition{}}}) to follow a query, and give each of them an AST node.
> Allow a query to begin with {{{}FROM{}}}.
> * Add validation rules to the validator to deduce the type of all
> expressions and to raise errors when a query is not semantically valid. This
> should be fairly straightforward because the
> * In {{{}SqlToRelConverter{}}}, generate the same {{RelNode}} relational
> algebra as the corresponding old-style SQL syntax.
> * I believe that each of the pipe steps has a corresponding method in
> {{{}RelBuilder{}}}. In most but not all cases, {{SqlToRelConverter}} is
> currently calling that method. In those cases, this task must either change
> {{SqlToRelConverter}} to use the {{RelBuilder}} method or justify why not.
> Use the ZetaSQL tests as a guideline for what needs to be supported. (Please
> research and add a link to this case.)
> Incremental rollout would be desirable. The first iteration could, for
> example, support just leading {{FROM}} and the {{|> WHERE}} clause. But it
> must include tests for parsing and validation and tests that execute queries.
> At first, the syntax would be enabled only in particular dialects, and
> disabled in the default Calcite dialect. But I think the changes should be
> made to the core parser (not the babel parser). We may enable pipe syntax by
> default, later.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)