This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 4f92110  Commit build products
4f92110 is described below

commit 4f92110d2ffe36d027e2d9360b072a9f92d4d2d8
Author: Build Pelican (action) <[email protected]>
AuthorDate: Wed Mar 19 18:14:36 2025 +0000

    Commit build products
---
 output/2025/03/11/ordering-analysis/index.html     |  42 ++++++++++++++++++++-
 output/feeds/all-en.atom.xml                       |  42 ++++++++++++++++++++-
 output/feeds/blog.atom.xml                         |  42 ++++++++++++++++++++-
 output/feeds/mustafa-akur-andrew-lamb.atom.xml     |  42 ++++++++++++++++++++-
 .../images/ordering_analysis/query_window_plan.png | Bin 0 -> 189377 bytes
 5 files changed, 164 insertions(+), 4 deletions(-)

diff --git a/output/2025/03/11/ordering-analysis/index.html 
b/output/2025/03/11/ordering-analysis/index.html
index f9760fa..c829a2c 100644
--- a/output/2025/03/11/ordering-analysis/index.html
+++ b/output/2025/03/11/ordering-analysis/index.html
@@ -146,7 +146,7 @@ Being logically streamable does not guarantee that a query 
will execute in a str
 </table>
 <p><br/></p>
 <blockquote style="border-left: 4px solid #007bff; padding: 10px; 
background-color: #f8f9fa;">
-<strong>How can a table have multiple orderings?:</strong> At first glance it 
may seem counterintuitive for a table to have more than one valid ordering. 
However, during query execution such scenarios can arise.
+<strong>How can a table have multiple orderings?</strong> At first glance it 
may seem counterintuitive for a table to have more than one valid ordering. 
However, during query execution such scenarios can arise.
 
 For example consider the following query:
 
@@ -293,6 +293,46 @@ Following third and fourth constraints for the simplified 
table, the succinct va
 <code>[amount ASC, price ASC]</code>, <br/>
 <code>[time_bin ASC]</code>,<br/>
 <code>[time ASC]</code> </p>
+<blockquote style="border-left: 4px solid #007bff; padding: 10px; 
background-color: #f8f9fa;">
+<p><strong>How can DataFusion find orderings?</strong></p> 
+DataFusion's <code>CREATE EXTERNAL TABLE</code> has a <code>WITH ORDER</code> 
clause (see <a 
href="https://datafusion.apache.org/user-guide/sql/ddl.html#create-external-table";>docs</a>)
 to specify the known orderings of the table during table creation. For example 
the following query:<br/>
+<pre><code>
+CREATE EXTERNAL TABLE source (
+    amount INT NOT NULL,
+    price DOUBLE NOT NULL,
+    time TIMESTAMP NOT NULL,
+    ...
+)
+STORED AS CSV
+WITH ORDER (time ASC)
+WITH ORDER (amount ASC, price ASC)
+LOCATION '/path/to/FILE_NAME.csv'
+OPTIONS ('has_header' 'true');
+</code></pre>
+communicates that <code>source</code> table has the orderings: <code>[time 
ASC]</code> and <code>[amount ASC, price ASC]</code>.<br/>
+When orderings are communicated from the source, DataFusion tracks the 
orderings through each operator while optimizing the plan.<br/>
+<ul>
+<li>add new orderings (such as when "date_bin" function is applied to the 
"time" column)</li>
+<li>Remove orderings, if operation doesn't preserve the ordering of the data 
at its input</li>
+<li>Update equivalent groups</li>
+<li>Update constant expressions</li>
+</ul>
+
+Figure 1 shows an example how DataFusion generates an efficient plan for the 
query:
+<pre><code>
+SELECT 
+  row_number() OVER (ORDER BY time) as rn,
+  time
+FROM events
+ORDER BY rn, time
+</code></pre>
+using the orderings of the query intermediates.<br/>
+<br/>
+<figure>
+<img alt="Window Query Datafusion Optimization" class="img-responsive" 
src="/blog/images/ordering_analysis/query_window_plan.png" width="80%"/>
+<figcaption><strong>Figure 1:</strong> DataFusion analyzes orderings of the 
sources and query intermediates to generate efficient plans</figcaption>
+</figure>
+</blockquote>
 <h3>Table Properties</h3>
 <p>In summary, for the example table, the following properties correctly 
describe the sort properties:</p>
 <ul>
diff --git a/output/feeds/all-en.atom.xml b/output/feeds/all-en.atom.xml
index 1a4efff..1c61f3b 100644
--- a/output/feeds/all-en.atom.xml
+++ b/output/feeds/all-en.atom.xml
@@ -124,7 +124,7 @@ Being logically streamable does not guarantee that a query 
will execute in a str
 &lt;/table&gt;
 &lt;p&gt;&lt;br/&gt;&lt;/p&gt;
 &lt;blockquote style="border-left: 4px solid #007bff; padding: 10px; 
background-color: #f8f9fa;"&gt;
-&lt;strong&gt;How can a table have multiple orderings?:&lt;/strong&gt; At 
first glance it may seem counterintuitive for a table to have more than one 
valid ordering. However, during query execution such scenarios can arise.
+&lt;strong&gt;How can a table have multiple orderings?&lt;/strong&gt; At first 
glance it may seem counterintuitive for a table to have more than one valid 
ordering. However, during query execution such scenarios can arise.
 
 For example consider the following query:
 
@@ -271,6 +271,46 @@ Following third and fourth constraints for the simplified 
table, the succinct va
 &lt;code&gt;[amount ASC, price ASC]&lt;/code&gt;, &lt;br/&gt;
 &lt;code&gt;[time_bin ASC]&lt;/code&gt;,&lt;br/&gt;
 &lt;code&gt;[time ASC]&lt;/code&gt; &lt;/p&gt;
+&lt;blockquote style="border-left: 4px solid #007bff; padding: 10px; 
background-color: #f8f9fa;"&gt;
+&lt;p&gt;&lt;strong&gt;How can DataFusion find 
orderings?&lt;/strong&gt;&lt;/p&gt; 
+DataFusion's &lt;code&gt;CREATE EXTERNAL TABLE&lt;/code&gt; has a 
&lt;code&gt;WITH ORDER&lt;/code&gt; clause (see &lt;a 
href="https://datafusion.apache.org/user-guide/sql/ddl.html#create-external-table"&gt;docs&lt;/a&gt;)
 to specify the known orderings of the table during table creation. For example 
the following query:&lt;br/&gt;
+&lt;pre&gt;&lt;code&gt;
+CREATE EXTERNAL TABLE source (
+    amount INT NOT NULL,
+    price DOUBLE NOT NULL,
+    time TIMESTAMP NOT NULL,
+    ...
+)
+STORED AS CSV
+WITH ORDER (time ASC)
+WITH ORDER (amount ASC, price ASC)
+LOCATION '/path/to/FILE_NAME.csv'
+OPTIONS ('has_header' 'true');
+&lt;/code&gt;&lt;/pre&gt;
+communicates that &lt;code&gt;source&lt;/code&gt; table has the orderings: 
&lt;code&gt;[time ASC]&lt;/code&gt; and &lt;code&gt;[amount ASC, price 
ASC]&lt;/code&gt;.&lt;br/&gt;
+When orderings are communicated from the source, DataFusion tracks the 
orderings through each operator while optimizing the plan.&lt;br/&gt;
+&lt;ul&gt;
+&lt;li&gt;add new orderings (such as when "date_bin" function is applied to 
the "time" column)&lt;/li&gt;
+&lt;li&gt;Remove orderings, if operation doesn't preserve the ordering of the 
data at its input&lt;/li&gt;
+&lt;li&gt;Update equivalent groups&lt;/li&gt;
+&lt;li&gt;Update constant expressions&lt;/li&gt;
+&lt;/ul&gt;
+
+Figure 1 shows an example how DataFusion generates an efficient plan for the 
query:
+&lt;pre&gt;&lt;code&gt;
+SELECT 
+  row_number() OVER (ORDER BY time) as rn,
+  time
+FROM events
+ORDER BY rn, time
+&lt;/code&gt;&lt;/pre&gt;
+using the orderings of the query intermediates.&lt;br/&gt;
+&lt;br/&gt;
+&lt;figure&gt;
+&lt;img alt="Window Query Datafusion Optimization" class="img-responsive" 
src="/blog/images/ordering_analysis/query_window_plan.png" width="80%"/&gt;
+&lt;figcaption&gt;&lt;strong&gt;Figure 1:&lt;/strong&gt; DataFusion analyzes 
orderings of the sources and query intermediates to generate efficient 
plans&lt;/figcaption&gt;
+&lt;/figure&gt;
+&lt;/blockquote&gt;
 &lt;h3&gt;Table Properties&lt;/h3&gt;
 &lt;p&gt;In summary, for the example table, the following properties correctly 
describe the sort properties:&lt;/p&gt;
 &lt;ul&gt;
diff --git a/output/feeds/blog.atom.xml b/output/feeds/blog.atom.xml
index a9119e1..67f1633 100644
--- a/output/feeds/blog.atom.xml
+++ b/output/feeds/blog.atom.xml
@@ -124,7 +124,7 @@ Being logically streamable does not guarantee that a query 
will execute in a str
 &lt;/table&gt;
 &lt;p&gt;&lt;br/&gt;&lt;/p&gt;
 &lt;blockquote style="border-left: 4px solid #007bff; padding: 10px; 
background-color: #f8f9fa;"&gt;
-&lt;strong&gt;How can a table have multiple orderings?:&lt;/strong&gt; At 
first glance it may seem counterintuitive for a table to have more than one 
valid ordering. However, during query execution such scenarios can arise.
+&lt;strong&gt;How can a table have multiple orderings?&lt;/strong&gt; At first 
glance it may seem counterintuitive for a table to have more than one valid 
ordering. However, during query execution such scenarios can arise.
 
 For example consider the following query:
 
@@ -271,6 +271,46 @@ Following third and fourth constraints for the simplified 
table, the succinct va
 &lt;code&gt;[amount ASC, price ASC]&lt;/code&gt;, &lt;br/&gt;
 &lt;code&gt;[time_bin ASC]&lt;/code&gt;,&lt;br/&gt;
 &lt;code&gt;[time ASC]&lt;/code&gt; &lt;/p&gt;
+&lt;blockquote style="border-left: 4px solid #007bff; padding: 10px; 
background-color: #f8f9fa;"&gt;
+&lt;p&gt;&lt;strong&gt;How can DataFusion find 
orderings?&lt;/strong&gt;&lt;/p&gt; 
+DataFusion's &lt;code&gt;CREATE EXTERNAL TABLE&lt;/code&gt; has a 
&lt;code&gt;WITH ORDER&lt;/code&gt; clause (see &lt;a 
href="https://datafusion.apache.org/user-guide/sql/ddl.html#create-external-table"&gt;docs&lt;/a&gt;)
 to specify the known orderings of the table during table creation. For example 
the following query:&lt;br/&gt;
+&lt;pre&gt;&lt;code&gt;
+CREATE EXTERNAL TABLE source (
+    amount INT NOT NULL,
+    price DOUBLE NOT NULL,
+    time TIMESTAMP NOT NULL,
+    ...
+)
+STORED AS CSV
+WITH ORDER (time ASC)
+WITH ORDER (amount ASC, price ASC)
+LOCATION '/path/to/FILE_NAME.csv'
+OPTIONS ('has_header' 'true');
+&lt;/code&gt;&lt;/pre&gt;
+communicates that &lt;code&gt;source&lt;/code&gt; table has the orderings: 
&lt;code&gt;[time ASC]&lt;/code&gt; and &lt;code&gt;[amount ASC, price 
ASC]&lt;/code&gt;.&lt;br/&gt;
+When orderings are communicated from the source, DataFusion tracks the 
orderings through each operator while optimizing the plan.&lt;br/&gt;
+&lt;ul&gt;
+&lt;li&gt;add new orderings (such as when "date_bin" function is applied to 
the "time" column)&lt;/li&gt;
+&lt;li&gt;Remove orderings, if operation doesn't preserve the ordering of the 
data at its input&lt;/li&gt;
+&lt;li&gt;Update equivalent groups&lt;/li&gt;
+&lt;li&gt;Update constant expressions&lt;/li&gt;
+&lt;/ul&gt;
+
+Figure 1 shows an example how DataFusion generates an efficient plan for the 
query:
+&lt;pre&gt;&lt;code&gt;
+SELECT 
+  row_number() OVER (ORDER BY time) as rn,
+  time
+FROM events
+ORDER BY rn, time
+&lt;/code&gt;&lt;/pre&gt;
+using the orderings of the query intermediates.&lt;br/&gt;
+&lt;br/&gt;
+&lt;figure&gt;
+&lt;img alt="Window Query Datafusion Optimization" class="img-responsive" 
src="/blog/images/ordering_analysis/query_window_plan.png" width="80%"/&gt;
+&lt;figcaption&gt;&lt;strong&gt;Figure 1:&lt;/strong&gt; DataFusion analyzes 
orderings of the sources and query intermediates to generate efficient 
plans&lt;/figcaption&gt;
+&lt;/figure&gt;
+&lt;/blockquote&gt;
 &lt;h3&gt;Table Properties&lt;/h3&gt;
 &lt;p&gt;In summary, for the example table, the following properties correctly 
describe the sort properties:&lt;/p&gt;
 &lt;ul&gt;
diff --git a/output/feeds/mustafa-akur-andrew-lamb.atom.xml 
b/output/feeds/mustafa-akur-andrew-lamb.atom.xml
index da0b115..d9ee35a 100644
--- a/output/feeds/mustafa-akur-andrew-lamb.atom.xml
+++ b/output/feeds/mustafa-akur-andrew-lamb.atom.xml
@@ -124,7 +124,7 @@ Being logically streamable does not guarantee that a query 
will execute in a str
 &lt;/table&gt;
 &lt;p&gt;&lt;br/&gt;&lt;/p&gt;
 &lt;blockquote style="border-left: 4px solid #007bff; padding: 10px; 
background-color: #f8f9fa;"&gt;
-&lt;strong&gt;How can a table have multiple orderings?:&lt;/strong&gt; At 
first glance it may seem counterintuitive for a table to have more than one 
valid ordering. However, during query execution such scenarios can arise.
+&lt;strong&gt;How can a table have multiple orderings?&lt;/strong&gt; At first 
glance it may seem counterintuitive for a table to have more than one valid 
ordering. However, during query execution such scenarios can arise.
 
 For example consider the following query:
 
@@ -271,6 +271,46 @@ Following third and fourth constraints for the simplified 
table, the succinct va
 &lt;code&gt;[amount ASC, price ASC]&lt;/code&gt;, &lt;br/&gt;
 &lt;code&gt;[time_bin ASC]&lt;/code&gt;,&lt;br/&gt;
 &lt;code&gt;[time ASC]&lt;/code&gt; &lt;/p&gt;
+&lt;blockquote style="border-left: 4px solid #007bff; padding: 10px; 
background-color: #f8f9fa;"&gt;
+&lt;p&gt;&lt;strong&gt;How can DataFusion find 
orderings?&lt;/strong&gt;&lt;/p&gt; 
+DataFusion's &lt;code&gt;CREATE EXTERNAL TABLE&lt;/code&gt; has a 
&lt;code&gt;WITH ORDER&lt;/code&gt; clause (see &lt;a 
href="https://datafusion.apache.org/user-guide/sql/ddl.html#create-external-table"&gt;docs&lt;/a&gt;)
 to specify the known orderings of the table during table creation. For example 
the following query:&lt;br/&gt;
+&lt;pre&gt;&lt;code&gt;
+CREATE EXTERNAL TABLE source (
+    amount INT NOT NULL,
+    price DOUBLE NOT NULL,
+    time TIMESTAMP NOT NULL,
+    ...
+)
+STORED AS CSV
+WITH ORDER (time ASC)
+WITH ORDER (amount ASC, price ASC)
+LOCATION '/path/to/FILE_NAME.csv'
+OPTIONS ('has_header' 'true');
+&lt;/code&gt;&lt;/pre&gt;
+communicates that &lt;code&gt;source&lt;/code&gt; table has the orderings: 
&lt;code&gt;[time ASC]&lt;/code&gt; and &lt;code&gt;[amount ASC, price 
ASC]&lt;/code&gt;.&lt;br/&gt;
+When orderings are communicated from the source, DataFusion tracks the 
orderings through each operator while optimizing the plan.&lt;br/&gt;
+&lt;ul&gt;
+&lt;li&gt;add new orderings (such as when "date_bin" function is applied to 
the "time" column)&lt;/li&gt;
+&lt;li&gt;Remove orderings, if operation doesn't preserve the ordering of the 
data at its input&lt;/li&gt;
+&lt;li&gt;Update equivalent groups&lt;/li&gt;
+&lt;li&gt;Update constant expressions&lt;/li&gt;
+&lt;/ul&gt;
+
+Figure 1 shows an example how DataFusion generates an efficient plan for the 
query:
+&lt;pre&gt;&lt;code&gt;
+SELECT 
+  row_number() OVER (ORDER BY time) as rn,
+  time
+FROM events
+ORDER BY rn, time
+&lt;/code&gt;&lt;/pre&gt;
+using the orderings of the query intermediates.&lt;br/&gt;
+&lt;br/&gt;
+&lt;figure&gt;
+&lt;img alt="Window Query Datafusion Optimization" class="img-responsive" 
src="/blog/images/ordering_analysis/query_window_plan.png" width="80%"/&gt;
+&lt;figcaption&gt;&lt;strong&gt;Figure 1:&lt;/strong&gt; DataFusion analyzes 
orderings of the sources and query intermediates to generate efficient 
plans&lt;/figcaption&gt;
+&lt;/figure&gt;
+&lt;/blockquote&gt;
 &lt;h3&gt;Table Properties&lt;/h3&gt;
 &lt;p&gt;In summary, for the example table, the following properties correctly 
describe the sort properties:&lt;/p&gt;
 &lt;ul&gt;
diff --git a/output/images/ordering_analysis/query_window_plan.png 
b/output/images/ordering_analysis/query_window_plan.png
new file mode 100644
index 0000000..ca30d22
Binary files /dev/null and 
b/output/images/ordering_analysis/query_window_plan.png differ


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to