Omega359 commented on code in PR #59:
URL: https://github.com/apache/datafusion-site/pull/59#discussion_r2002097220


##########
content/blog/2025-03-11-ordering-analysis.md:
##########
@@ -291,6 +291,53 @@ Following third and fourth constraints for the simplified 
table, the succinct va
 `[time_bin ASC]`,  
 `[time ASC]`  
 
+<blockquote style="border-left: 4px solid #007bff; padding: 10px; 
background-color: #f8f9fa;">
+<p><strong>How can DataFusion find orderings?</strong></p> 
+DataFusion's <code>CREATE EXTERNAL TABLE</code> has a <code>WITH ORDER</code> 
clause (see <a 
href="https://datafusion.apache.org/user-guide/sql/ddl.html#create-external-table";>docs</a>)
 to specify the known orderings of the table during table creation. For 
example, following query:<br>
+<pre><code>
+CREATE EXTERNAL TABLE source (
+    amount INT NOT NULL,
+    price DOUBLE NOT NULL,
+    time TIMESTAMP NOT NULL,
+    ...
+)
+STORED AS CSV
+WITH ORDER (time ASC)
+WITH ORDER (amount ASC, price ASC)
+LOCATION '/path/to/FILE_NAME.csv'
+OPTIONS ('has_header' 'true');
+</code></pre>
+communicates that <code>source</code> table has the orderings: <code>[time 
ASC]</code> and <code>[amount ASC, price ASC]</code>.<br>
+When orderings are communicated from the source, DataFusion tracks the 
orderings through each operator while optimizing the plan.<br>
+<ul>
+<li>add new orderings (such as when "date_bin" function is applied to the 
"time" column)</li>
+<li>Remove orderings, if operation doesn't preserve the ordering of the data 
at its input</li>
+<li>Update equivalent groups</li>
+<li>Update constant expressions</li>
+</ul>
+
+Figure 1, shows an example how DataFusion generates an efficient plan for the 
query:

Review Comment:
   ```suggestion
   Figure 1 shows an example how DataFusion generates an efficient plan for the 
query:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to