[jira] [Comment Edited] (CALCITE-2280) "Super-liberal" parser that accepts all SQL dialects

Erica Ehrhardt (JIRA) Sat, 28 Apr 2018 02:06:58 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457464#comment-16457464
 ]


Erica Ehrhardt edited comment on CALCITE-2280 at 4/28/18 9:05 AM:
------------------------------------------------------------------

We at Thumbtack basically want a SQL Army Knife. Our current or future use 
cases include:
 * Building a BigQuery Standard SQL AST from OData queries in 
[Becquerel|https://github.com/thumbtack/becquerel], our gateway between 
Salesforce and our big data systems. Also future cleaner support for 
Becquerel's other backends (ES and JDBC, mostly PG).
 * Tracing data lineage through multiple layers of views and materialized 
queries: we're doing this with regexes and Python now, and it's fragile, but 
useful for automatically generating dependency graphs for scheduling queries.
 * Programmatic auditing and rewriting of queries and DDL: what will eventually 
break if we delete this column or change its type? Can we compensate for it 
automatically?
 * Translating between the multiple dialects in common use here: PostgreSQL, 
Spark SQL, and BigQuery SQL.

I've been talking to a few colleagues about these and other possible Calcite 
applications. We can probably provide sample queries in all three of the 
dialects we use. What kinds of sample are most useful? Also, I'd be willing to 
help with a BQ parser, since I'm already experimenting with one.


was (Author: eee):
We at Thumbtack basically want a SQL Army Knife. Our current or future use 
cases include:
 * Building a BigQuery Standard SQL AST from OData queries in 
[Becquerel|https://github.com/thumbtack/becquerel], our gateway between 
Salesforce and our big data systems. Also future cleaner support for 
Becquerel's other backends (ES and JDBC, mostly PG).
 * Tracing data lineage through multiple layers of views and materialized 
queries: we're doing this with regexes and Python now, and it's fragile, but 
useful for automatically generating dependency graphs for scheduling queries.
 * Programmatic auditing and rewriting of queries and DDL: what will eventually 
break if we delete this column or change its type? Can we compensate for it 
automatically?
 * Translating between the multiple dialects in common use here: PostgreSQL, 
Spark SQL, and BigQuery SQL.

I've been talking to a few colleagues about these and other possible Calcite 
applications. We can probably provide sample queries in all three of the 
dialects we use. What kinds of sample are most useful?

> "Super-liberal" parser that accepts all SQL dialects
> ----------------------------------------------------
>
>                 Key: CALCITE-2280
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2280
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>            Priority: Major
>
> Create a parser that accepts all SQL dialects.
> It would accept common dialects such as Oracle, MySQL, PostgreSQL, BigQuery. 
> If you have preferred dialects, please let us know in the comments section. 
> (If you're willing to work on a particular dialect, even better!)
> We would do this in a new module, inheriting and extending the parser in the 
> same way that the DDL parser in the "server" module does.
> This would be a messy and difficult project, because we would have to comply 
> with the rules of each parser (and its set of built-in functions) rather than 
> writing the rules as we would like them to be. That's why I would keep it out 
> of the core parser. But it would also have large benefits.
> This would be new territory Calcite: as a tool for manipulating/understanding 
> SQL, not (necessarily) for relational algebra or execution.
> Some possible uses:
> * analyze query lineage (what tables and columns are used in a query);
> * translate from one SQL dialect to another (using the JDBC adapter to 
> generate SQL in the target dialect);
> * a "deep" compatibility mode (much more comprehensive than the current 
> compatibility mode) where Calcite could pretend to be, say, Oracle;
> * SQL parser as a service: a REST call gives a SQL query, and returns a JSON 
> or XML document with the parse tree.
> If you can think of interesting uses, please discuss in the comments.
> There are similarities with Uber's 
> [QueryParser|https://eng.uber.com/queryparser/] tool. Maybe we can 
> collaborate, or make use of their test cases.
> We will need a lot of sample queries. If you are able to contribute sample 
> queries for particular dialects, please discuss in the comments section. It 
> would be good if the sample queries are based on a familiar schema (e.g. 
> scott or foodmart) but we can be flexible about this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (CALCITE-2280) "Super-liberal" parser that accepts all SQL dialects

Reply via email to