goldmedal commented on code in PR #97: URL: https://github.com/apache/datafusion-site/pull/97#discussion_r2236086680
########## content/blog/2025-07-27-extending-sql-with-satafusion.md: ########## @@ -0,0 +1,248 @@ +--- +layout: post +title: Implementing your own SQL dialect and SQL statements with DataFusion +date: 2025-07-26 +author: Aditya Singh Rathore +categories: [tutorial] +--- + +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +Have you ever wished you could extend SQL with custom statements tailored to your specific use case? Maybe you're working with a parquet-files-on-S3 storage approach and need an `ATTACH DATABASE` statement, or perhaps you want to implement catalog management features similar to [DuckDB] or [SQLite]. With [Apache DataFusion], you can do exactly that – and it's more straightforward than you might think. + +[DuckDB]: https://duckdb.org/ +[SQLite]: https://www.sqlite.org/ +[Apache DataFusion]: https://datafusion.apache.org/ + +## The Challenge: Beyond Standard SQL + +Imagine you're building a data platform that uses a parquet-files-on-S3 storage pattern. Your application needs to dynamically discover and attach databases, similar to how [DuckDB] handles multiple databases with its `ATTACH` statement. While [DataFusion] supports `CREATE EXTERNAL TABLE`, you need something more flexible – perhaps a statement like: + +```sql +CREATE EXTERNAL CATALOG my_catalog +STORED AS PARQUET +LOCATION 's3://my-bucket/data/' +OPTIONS ( + 'aws.region' = 'us-west-2', + 'catalog.type' = 'hive_metastore' +); +``` + +Standard SQL doesn't have this capability, but DataFusion's extensible architecture makes it possible to add custom SQL statements like this. + +## Understanding the SQL Processing Pipeline + +Before diving into custom implementations, let's understand how DataFusion processes SQL queries. The journey from SQL text to execution follows this path: + +```text ++-------+ +--------+ +-----+ +-------------+ +--------------+ +----------+ +| Query | ---> | Parser | ---> | AST | ---> |Logical Plan | ---> |Physical Plan | ---> |Execution | ++-------+ +--------+ +-----+ +-------------+ +--------------+ +----------+ + +``` + +1. **SQL Text:** The raw SQL string you write +2. **Parser:** DataFusion's `DFParser` converts SQL text into an Abstract Syntax Tree (AST) +3. **AST:** A structured representation of the SQL statement using DataFusion's `Statement` enum Review Comment: ```suggestion 2. **AST:** DataFusion's `DFParser` converts SQL text into an Abstract Syntax Tree (AST), a structured representation of the SQL statement using DataFusion's `Statement` enum ``` As I mentioned above, we can merge the point 2 and 3 to explain how DataFusion parse a SQL text to AST. ########## content/blog/2025-07-27-extending-sql-with-satafusion.md: ########## @@ -0,0 +1,248 @@ +--- +layout: post +title: Implementing your own SQL dialect and SQL statements with DataFusion +date: 2025-07-26 +author: Aditya Singh Rathore +categories: [tutorial] +--- + +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +Have you ever wished you could extend SQL with custom statements tailored to your specific use case? Maybe you're working with a parquet-files-on-S3 storage approach and need an `ATTACH DATABASE` statement, or perhaps you want to implement catalog management features similar to [DuckDB] or [SQLite]. With [Apache DataFusion], you can do exactly that – and it's more straightforward than you might think. + +[DuckDB]: https://duckdb.org/ +[SQLite]: https://www.sqlite.org/ +[Apache DataFusion]: https://datafusion.apache.org/ + +## The Challenge: Beyond Standard SQL + +Imagine you're building a data platform that uses a parquet-files-on-S3 storage pattern. Your application needs to dynamically discover and attach databases, similar to how [DuckDB] handles multiple databases with its `ATTACH` statement. While [DataFusion] supports `CREATE EXTERNAL TABLE`, you need something more flexible – perhaps a statement like: + +```sql +CREATE EXTERNAL CATALOG my_catalog +STORED AS PARQUET +LOCATION 's3://my-bucket/data/' +OPTIONS ( + 'aws.region' = 'us-west-2', + 'catalog.type' = 'hive_metastore' +); +``` + +Standard SQL doesn't have this capability, but DataFusion's extensible architecture makes it possible to add custom SQL statements like this. + +## Understanding the SQL Processing Pipeline + +Before diving into custom implementations, let's understand how DataFusion processes SQL queries. The journey from SQL text to execution follows this path: + +```text ++-------+ +--------+ +-----+ +-------------+ +--------------+ +----------+ +| Query | ---> | Parser | ---> | AST | ---> |Logical Plan | ---> |Physical Plan | ---> |Execution | ++-------+ +--------+ +-----+ +-------------+ +--------------+ +----------+ + Review Comment: ```suggestion +----------+ +-----+ +--------------+ +---------------+ +-----------+ | SQL Text |---> | AST | ---> | Logical Plan | ---> | Physical Plan | ---> | Execution | +----------+ +-----+ +--------------+ +---------------+ +-----------+ ``` I think this pipeline present each status of a SQL text change but the parser isn't a status. I prefer to remove `Parser` from them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org