Omega359 commented on code in PR #59: URL: https://github.com/apache/datafusion-site/pull/59#discussion_r2002097220
########## content/blog/2025-03-11-ordering-analysis.md: ########## @@ -291,6 +291,53 @@ Following third and fourth constraints for the simplified table, the succinct va `[time_bin ASC]`, `[time ASC]` +<blockquote style="border-left: 4px solid #007bff; padding: 10px; background-color: #f8f9fa;"> +<p><strong>How can DataFusion find orderings?</strong></p> +DataFusion's <code>CREATE EXTERNAL TABLE</code> has a <code>WITH ORDER</code> clause (see <a href="https://datafusion.apache.org/user-guide/sql/ddl.html#create-external-table">docs</a>) to specify the known orderings of the table during table creation. For example, following query:<br> +<pre><code> +CREATE EXTERNAL TABLE source ( + amount INT NOT NULL, + price DOUBLE NOT NULL, + time TIMESTAMP NOT NULL, + ... +) +STORED AS CSV +WITH ORDER (time ASC) +WITH ORDER (amount ASC, price ASC) +LOCATION '/path/to/FILE_NAME.csv' +OPTIONS ('has_header' 'true'); +</code></pre> +communicates that <code>source</code> table has the orderings: <code>[time ASC]</code> and <code>[amount ASC, price ASC]</code>.<br> +When orderings are communicated from the source, DataFusion tracks the orderings through each operator while optimizing the plan.<br> +<ul> +<li>add new orderings (such as when "date_bin" function is applied to the "time" column)</li> +<li>Remove orderings, if operation doesn't preserve the ordering of the data at its input</li> +<li>Update equivalent groups</li> +<li>Update constant expressions</li> +</ul> + +Figure 1, shows an example how DataFusion generates an efficient plan for the query: Review Comment: ```suggestion Figure 1 shows an example how DataFusion generates an efficient plan for the query: ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org