This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git


The following commit(s) were added to refs/heads/main by this push:
     new ca0a5a2  Add WITH ORDER example to blog post (#59)
ca0a5a2 is described below

commit ca0a5a2fc9d9677f45da4c6c6a16df50d07ba270
Author: Mustafa Akur <[email protected]>
AuthorDate: Wed Mar 19 11:14:14 2025 -0700

    Add WITH ORDER example to blog post (#59)
    
    * Add WITH ORDER example
    
    * Apply suggestions from code review
    
    Co-authored-by: Andrew Lamb <[email protected]>
    
    * format tweaks
    
    * Add a new example for ordering optimization
    
    * Apply suggestions from code review
    
    Co-authored-by: Bruce Ritchie <[email protected]>
    
    * Address @Omega359's comments
    
    ---------
    
    Co-authored-by: Andrew Lamb <[email protected]>
    Co-authored-by: Bruce Ritchie <[email protected]>
---
 content/blog/2025-03-11-ordering-analysis.md       |  49 ++++++++++++++++++++-
 .../images/ordering_analysis/query_window_plan.png | Bin 0 -> 189377 bytes
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/content/blog/2025-03-11-ordering-analysis.md 
b/content/blog/2025-03-11-ordering-analysis.md
index e21408e..1754af3 100644
--- a/content/blog/2025-03-11-ordering-analysis.md
+++ b/content/blog/2025-03-11-ordering-analysis.md
@@ -133,7 +133,7 @@ Let's start by creating an example table that we will refer 
throughout the post.
 <br>
 
 <blockquote style="border-left: 4px solid #007bff; padding: 10px; 
background-color: #f8f9fa;">
-<strong>How can a table have multiple orderings?:</strong> At first glance it 
may seem counterintuitive for a table to have more than one valid ordering. 
However, during query execution such scenarios can arise.
+<strong>How can a table have multiple orderings?</strong> At first glance it 
may seem counterintuitive for a table to have more than one valid ordering. 
However, during query execution such scenarios can arise.
 
 For example consider the following query:
 
@@ -291,6 +291,53 @@ Following third and fourth constraints for the simplified 
table, the succinct va
 `[time_bin ASC]`,  
 `[time ASC]`  
 
+<blockquote style="border-left: 4px solid #007bff; padding: 10px; 
background-color: #f8f9fa;">
+<p><strong>How can DataFusion find orderings?</strong></p> 
+DataFusion's <code>CREATE EXTERNAL TABLE</code> has a <code>WITH ORDER</code> 
clause (see <a 
href="https://datafusion.apache.org/user-guide/sql/ddl.html#create-external-table";>docs</a>)
 to specify the known orderings of the table during table creation. For example 
the following query:<br>
+<pre><code>
+CREATE EXTERNAL TABLE source (
+    amount INT NOT NULL,
+    price DOUBLE NOT NULL,
+    time TIMESTAMP NOT NULL,
+    ...
+)
+STORED AS CSV
+WITH ORDER (time ASC)
+WITH ORDER (amount ASC, price ASC)
+LOCATION '/path/to/FILE_NAME.csv'
+OPTIONS ('has_header' 'true');
+</code></pre>
+communicates that <code>source</code> table has the orderings: <code>[time 
ASC]</code> and <code>[amount ASC, price ASC]</code>.<br>
+When orderings are communicated from the source, DataFusion tracks the 
orderings through each operator while optimizing the plan.<br>
+<ul>
+<li>add new orderings (such as when "date_bin" function is applied to the 
"time" column)</li>
+<li>Remove orderings, if operation doesn't preserve the ordering of the data 
at its input</li>
+<li>Update equivalent groups</li>
+<li>Update constant expressions</li>
+</ul>
+
+Figure 1 shows an example how DataFusion generates an efficient plan for the 
query:
+<pre><code>
+SELECT 
+  row_number() OVER (ORDER BY time) as rn,
+  time
+FROM events
+ORDER BY rn, time
+</code></pre>
+using the orderings of the query intermediates.<br>
+<br>
+<figure>
+<img
+src="/blog/images/ordering_analysis/query_window_plan.png"
+width="80%"
+class="img-responsive"
+alt="Window Query Datafusion Optimization"
+/>
+<figcaption><strong>Figure 1:</strong> DataFusion analyzes orderings of the 
sources and query intermediates to generate efficient plans</figcaption>
+</figure>
+
+</blockquote>
+
 ### Table Properties  
 
 In summary, for the example table, the following properties correctly describe 
the sort properties:
diff --git a/content/images/ordering_analysis/query_window_plan.png 
b/content/images/ordering_analysis/query_window_plan.png
new file mode 100644
index 0000000..ca30d22
Binary files /dev/null and 
b/content/images/ordering_analysis/query_window_plan.png differ


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to