Adez017 commented on code in PR #97:
URL: https://github.com/apache/datafusion-site/pull/97#discussion_r2236167131
##########
content/blog/2025-07-27-extending-sql-with-satafusion.md:
##########
@@ -0,0 +1,248 @@
+---
+layout: post
+title: Implementing your own SQL dialect and SQL statements with DataFusion
+date: 2025-07-26
+author: Aditya Singh Rathore
+categories: [tutorial]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Have you ever wished you could extend SQL with custom statements tailored to
your specific use case? Maybe you're working with a parquet-files-on-S3 storage
approach and need an `ATTACH DATABASE` statement, or perhaps you want to
implement catalog management features similar to [DuckDB] or [SQLite]. With
[Apache DataFusion], you can do exactly that – and it's more straightforward
than you might think.
+
+[DuckDB]: https://duckdb.org/
+[SQLite]: https://www.sqlite.org/
+[Apache DataFusion]: https://datafusion.apache.org/
+
+## The Challenge: Beyond Standard SQL
+
+Imagine you're building a data platform that uses a parquet-files-on-S3
storage pattern. Your application needs to dynamically discover and attach
databases, similar to how [DuckDB] handles multiple databases with its `ATTACH`
statement. While [DataFusion] supports `CREATE EXTERNAL TABLE`, you need
something more flexible – perhaps a statement like:
+
+```sql
+CREATE EXTERNAL CATALOG my_catalog
+STORED AS PARQUET
+LOCATION 's3://my-bucket/data/'
+OPTIONS (
+ 'aws.region' = 'us-west-2',
+ 'catalog.type' = 'hive_metastore'
+);
+```
+
+Standard SQL doesn't have this capability, but DataFusion's extensible
architecture makes it possible to add custom SQL statements like this.
+
+## Understanding the SQL Processing Pipeline
+
+Before diving into custom implementations, let's understand how DataFusion
processes SQL queries. The journey from SQL text to execution follows this path:
+
+```text
++-------+ +--------+ +-----+ +-------------+
+--------------+ +----------+
+| Query | ---> | Parser | ---> | AST | ---> |Logical Plan | ---> |Physical
Plan | ---> |Execution |
++-------+ +--------+ +-----+ +-------------+
+--------------+ +----------+
+
Review Comment:
thanks for the suggestion @goldmedal
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]