This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new aff8f6f786 Publish built docs triggered by 
33a2531f470df26b652fbee394a6dc74da55ff89
aff8f6f786 is described below

commit aff8f6f786f55c1b4bb4b714422720aeedbc7406
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Wed May 28 16:24:50 2025 +0000

    Publish built docs triggered by 33a2531f470df26b652fbee394a6dc74da55ff89
---
 _sources/user-guide/sql/ddl.md.txt | 33 +++++++++++++-
 searchindex.js                     |  2 +-
 user-guide/sql/ddl.html            | 90 ++++++++++++++++++++++++++++++++++++--
 3 files changed, 118 insertions(+), 7 deletions(-)

diff --git a/_sources/user-guide/sql/ddl.md.txt 
b/_sources/user-guide/sql/ddl.md.txt
index fc18154bec..ff8fa9bac0 100644
--- a/_sources/user-guide/sql/ddl.md.txt
+++ b/_sources/user-guide/sql/ddl.md.txt
@@ -82,6 +82,8 @@ For a comprehensive list of format-specific options that can 
be specified in the
 a path to a file or directory of partitioned files locally or on an
 object store.
 
+### Example: Parquet
+
 Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE` 
SQL statement such as the following. It is not necessary to
 provide schema information for Parquet files.
 
@@ -91,6 +93,23 @@ STORED AS PARQUET
 LOCATION '/mnt/nyctaxi/tripdata.parquet';
 ```
 
+:::{note}
+Statistics
+: By default, when a table is created, DataFusion will _NOT_ read the files
+to gather statistics, which can be expensive but can accelerate subsequent
+queries substantially. If you want to gather statistics
+when creating a table, set the `datafusion.execution.collect_statistics`
+configuration option to `true` before creating the table. For example:
+
+```sql
+SET datafusion.execution.collect_statistics = true;
+```
+
+See the [config settings docs](../configs.md) for more details.
+:::
+
+### Example: Comma Separated Value (CSV)
+
 CSV data sources can also be registered by executing a `CREATE EXTERNAL TABLE` 
SQL statement. The schema will be inferred based on
 scanning a subset of the file.
 
@@ -101,6 +120,8 @@ LOCATION '/path/to/aggregate_simple.csv'
 OPTIONS ('has_header' 'true');
 ```
 
+### Example: Compression
+
 It is also possible to use compressed files, such as `.csv.gz`:
 
 ```sql
@@ -111,6 +132,8 @@ LOCATION '/path/to/aggregate_simple.csv.gz'
 OPTIONS ('has_header' 'true');
 ```
 
+### Example: Specifying Schema
+
 It is also possible to specify the schema manually.
 
 ```sql
@@ -134,6 +157,8 @@ LOCATION '/path/to/aggregate_test_100.csv'
 OPTIONS ('has_header' 'true');
 ```
 
+### Example: Partitioned Tables
+
 It is also possible to specify a directory that contains a partitioned
 table (multiple files with the same schema)
 
@@ -144,7 +169,9 @@ LOCATION '/path/to/directory/of/files'
 OPTIONS ('has_header' 'true');
 ```
 
-With `CREATE UNBOUNDED EXTERNAL TABLE` SQL statement. We can create unbounded 
data sources such as following:
+### Example: Unbounded Data Sources
+
+We can create unbounded data sources using the `CREATE UNBOUNDED EXTERNAL 
TABLE` SQL statement.
 
 ```sql
 CREATE UNBOUNDED EXTERNAL TABLE taxi
@@ -154,6 +181,8 @@ LOCATION '/mnt/nyctaxi/tripdata.parquet';
 
 Note that this statement actually reads data from a fixed-size file, so a 
better example would involve reading from a FIFO file. Nevertheless, once 
Datafusion sees the `UNBOUNDED` keyword in a data source, it tries to execute 
queries that refer to this unbounded source in streaming fashion. If this is 
not possible according to query specifications, plan generation fails stating 
it is not possible to execute given query in streaming fashion. Note that 
queries that can run with unbounded s [...]
 
+### Example: `WITH ORDER` Clause
+
 When creating an output from a data source that is already ordered by
 an expression, you can pre-specify the order of the data using the
 `WITH ORDER` clause. This applies even if the expression used for
@@ -190,7 +219,7 @@ WITH ORDER (sort_expression1 [ASC | DESC] [NULLS { FIRST | 
LAST }]
          [, sort_expression2 [ASC | DESC] [NULLS { FIRST | LAST }] ...])
 ```
 
-### Cautions when using the WITH ORDER Clause
+#### Cautions when using the WITH ORDER Clause
 
 - It's important to understand that using the `WITH ORDER` clause in the 
`CREATE EXTERNAL TABLE` statement only specifies the order in which the data 
should be read from the external file. If the data in the file is not already 
sorted according to the specified order, then the results may not be correct.
 
diff --git a/searchindex.js b/searchindex.js
index 93f64d138c..ff8d40649b 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles":{"!=":[[54,"op-neq"]],"!~":[[54,"op-re-not-match"]],"!~*":[[54,"op-re-not-match-i"]],"!~~":[[54,"id19"]],"!~~*":[[54,"id20"]],"#":[[54,"op-bit-xor"]],"%":[[54,"op-modulo"]],"&":[[54,"op-bit-and"]],"(relation,
 name) tuples in logical fields and logical columns are 
unique":[[12,"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]],"*":[[54,"op-multiply"]],"+":[[54,"op-plus"]],"-":[[54,"op-minus"]],"/":[[54,"op-divide"]],"2022
 Q2":[[10,"q2"]] [...]
\ No newline at end of file
+Search.setIndex({"alltitles":{"!=":[[54,"op-neq"]],"!~":[[54,"op-re-not-match"]],"!~*":[[54,"op-re-not-match-i"]],"!~~":[[54,"id19"]],"!~~*":[[54,"id20"]],"#":[[54,"op-bit-xor"]],"%":[[54,"op-modulo"]],"&":[[54,"op-bit-and"]],"(relation,
 name) tuples in logical fields and logical columns are 
unique":[[12,"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]],"*":[[54,"op-multiply"]],"+":[[54,"op-plus"]],"-":[[54,"op-minus"]],"/":[[54,"op-divide"]],"2022
 Q2":[[10,"q2"]] [...]
\ No newline at end of file
diff --git a/user-guide/sql/ddl.html b/user-guide/sql/ddl.html
index 097aa39e5f..ff98f46987 100644
--- a/user-guide/sql/ddl.html
+++ b/user-guide/sql/ddl.html
@@ -543,10 +543,56 @@
   </a>
   <ul class="nav section-nav flex-column">
    <li class="toc-h3 nav-item toc-entry">
-    <a class="reference internal nav-link" 
href="#cautions-when-using-the-with-order-clause">
-     Cautions when using the WITH ORDER Clause
+    <a class="reference internal nav-link" href="#example-parquet">
+     Example: Parquet
     </a>
    </li>
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" 
href="#example-comma-separated-value-csv">
+     Example: Comma Separated Value (CSV)
+    </a>
+   </li>
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" href="#example-compression">
+     Example: Compression
+    </a>
+   </li>
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" href="#example-specifying-schema">
+     Example: Specifying Schema
+    </a>
+   </li>
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" href="#example-partitioned-tables">
+     Example: Partitioned Tables
+    </a>
+   </li>
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" 
href="#example-unbounded-data-sources">
+     Example: Unbounded Data Sources
+    </a>
+   </li>
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" href="#example-with-order-clause">
+     Example:
+     <code class="docutils literal notranslate">
+      <span class="pre">
+       WITH
+      </span>
+      <span class="pre">
+       ORDER
+      </span>
+     </code>
+     Clause
+    </a>
+    <ul class="nav section-nav flex-column">
+     <li class="toc-h4 nav-item toc-entry">
+      <a class="reference internal nav-link" 
href="#cautions-when-using-the-with-order-clause">
+       Cautions when using the WITH ORDER Clause
+      </a>
+     </li>
+    </ul>
+   </li>
   </ul>
  </li>
  <li class="toc-h2 nav-item toc-entry">
@@ -669,6 +715,8 @@ file system or remote object store as a named table which 
can be queried.</p>
 <p><code class="docutils literal notranslate"><span 
class="pre">LOCATION</span> <span class="pre">&lt;literal&gt;</span></code> 
specifies the location to find the data. It can be
 a path to a file or directory of partitioned files locally or on an
 object store.</p>
+<section id="example-parquet">
+<h3>Example: Parquet<a class="headerlink" href="#example-parquet" title="Link 
to this heading">¶</a></h3>
 <p>Parquet data sources can be registered by executing a <code class="docutils 
literal notranslate"><span class="pre">CREATE</span> <span 
class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL statement 
such as the following. It is not necessary to
 provide schema information for Parquet files.</p>
 <div class="highlight-sql notranslate"><div 
class="highlight"><pre><span></span><span class="k">CREATE</span><span 
class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span 
class="k">TABLE</span><span class="w"> </span><span class="n">taxi</span>
@@ -676,6 +724,24 @@ provide schema information for Parquet files.</p>
 <span class="k">LOCATION</span><span class="w"> </span><span 
class="s1">&#39;/mnt/nyctaxi/tripdata.parquet&#39;</span><span 
class="p">;</span>
 </pre></div>
 </div>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<dl class="simple myst">
+<dt>Statistics</dt><dd><p>By default, when a table is created, DataFusion will 
<em>NOT</em> read the files
+to gather statistics, which can be expensive but can accelerate subsequent
+queries substantially. If you want to gather statistics
+when creating a table, set the <code class="docutils literal 
notranslate"><span 
class="pre">datafusion.execution.collect_statistics</span></code>
+configuration option to <code class="docutils literal notranslate"><span 
class="pre">true</span></code> before creating the table. For example:</p>
+</dd>
+</dl>
+<div class="highlight-sql notranslate"><div 
class="highlight"><pre><span></span><span class="k">SET</span><span class="w"> 
</span><span class="n">datafusion</span><span class="p">.</span><span 
class="n">execution</span><span class="p">.</span><span 
class="n">collect_statistics</span><span class="w"> </span><span 
class="o">=</span><span class="w"> </span><span class="k">true</span><span 
class="p">;</span>
+</pre></div>
+</div>
+<p>See the <a class="reference internal" href="../configs.html"><span 
class="std std-doc">config settings docs</span></a> for more details.</p>
+</div>
+</section>
+<section id="example-comma-separated-value-csv">
+<h3>Example: Comma Separated Value (CSV)<a class="headerlink" 
href="#example-comma-separated-value-csv" title="Link to this 
heading">¶</a></h3>
 <p>CSV data sources can also be registered by executing a <code 
class="docutils literal notranslate"><span class="pre">CREATE</span> <span 
class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL 
statement. The schema will be inferred based on
 scanning a subset of the file.</p>
 <div class="highlight-sql notranslate"><div 
class="highlight"><pre><span></span><span class="k">CREATE</span><span 
class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span 
class="k">TABLE</span><span class="w"> </span><span class="n">test</span>
@@ -684,6 +750,9 @@ scanning a subset of the file.</p>
 <span class="k">OPTIONS</span><span class="w"> </span><span 
class="p">(</span><span class="s1">&#39;has_header&#39;</span><span class="w"> 
</span><span class="s1">&#39;true&#39;</span><span class="p">);</span>
 </pre></div>
 </div>
+</section>
+<section id="example-compression">
+<h3>Example: Compression<a class="headerlink" href="#example-compression" 
title="Link to this heading">¶</a></h3>
 <p>It is also possible to use compressed files, such as <code class="docutils 
literal notranslate"><span class="pre">.csv.gz</span></code>:</p>
 <div class="highlight-sql notranslate"><div 
class="highlight"><pre><span></span><span class="k">CREATE</span><span 
class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span 
class="k">TABLE</span><span class="w"> </span><span class="n">test</span>
 <span class="n">STORED</span><span class="w"> </span><span 
class="k">AS</span><span class="w"> </span><span class="n">CSV</span>
@@ -692,6 +761,9 @@ scanning a subset of the file.</p>
 <span class="k">OPTIONS</span><span class="w"> </span><span 
class="p">(</span><span class="s1">&#39;has_header&#39;</span><span class="w"> 
</span><span class="s1">&#39;true&#39;</span><span class="p">);</span>
 </pre></div>
 </div>
+</section>
+<section id="example-specifying-schema">
+<h3>Example: Specifying Schema<a class="headerlink" 
href="#example-specifying-schema" title="Link to this heading">¶</a></h3>
 <p>It is also possible to specify the schema manually.</p>
 <div class="highlight-sql notranslate"><div 
class="highlight"><pre><span></span><span class="k">CREATE</span><span 
class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span 
class="k">TABLE</span><span class="w"> </span><span class="n">test</span><span 
class="w"> </span><span class="p">(</span>
 <span class="w">    </span><span class="n">c1</span><span class="w">  
</span><span class="nb">VARCHAR</span><span class="w"> </span><span 
class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span 
class="p">,</span>
@@ -713,6 +785,9 @@ scanning a subset of the file.</p>
 <span class="k">OPTIONS</span><span class="w"> </span><span 
class="p">(</span><span class="s1">&#39;has_header&#39;</span><span class="w"> 
</span><span class="s1">&#39;true&#39;</span><span class="p">);</span>
 </pre></div>
 </div>
+</section>
+<section id="example-partitioned-tables">
+<h3>Example: Partitioned Tables<a class="headerlink" 
href="#example-partitioned-tables" title="Link to this heading">¶</a></h3>
 <p>It is also possible to specify a directory that contains a partitioned
 table (multiple files with the same schema)</p>
 <div class="highlight-sql notranslate"><div 
class="highlight"><pre><span></span><span class="k">CREATE</span><span 
class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span 
class="k">TABLE</span><span class="w"> </span><span class="n">test</span>
@@ -721,13 +796,19 @@ table (multiple files with the same schema)</p>
 <span class="k">OPTIONS</span><span class="w"> </span><span 
class="p">(</span><span class="s1">&#39;has_header&#39;</span><span class="w"> 
</span><span class="s1">&#39;true&#39;</span><span class="p">);</span>
 </pre></div>
 </div>
-<p>With <code class="docutils literal notranslate"><span 
class="pre">CREATE</span> <span class="pre">UNBOUNDED</span> <span 
class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL 
statement. We can create unbounded data sources such as following:</p>
+</section>
+<section id="example-unbounded-data-sources">
+<h3>Example: Unbounded Data Sources<a class="headerlink" 
href="#example-unbounded-data-sources" title="Link to this heading">¶</a></h3>
+<p>We can create unbounded data sources using the <code class="docutils 
literal notranslate"><span class="pre">CREATE</span> <span 
class="pre">UNBOUNDED</span> <span class="pre">EXTERNAL</span> <span 
class="pre">TABLE</span></code> SQL statement.</p>
 <div class="highlight-sql notranslate"><div 
class="highlight"><pre><span></span><span class="k">CREATE</span><span 
class="w"> </span><span class="n">UNBOUNDED</span><span class="w"> </span><span 
class="k">EXTERNAL</span><span class="w"> </span><span 
class="k">TABLE</span><span class="w"> </span><span class="n">taxi</span>
 <span class="n">STORED</span><span class="w"> </span><span 
class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span>
 <span class="k">LOCATION</span><span class="w"> </span><span 
class="s1">&#39;/mnt/nyctaxi/tripdata.parquet&#39;</span><span 
class="p">;</span>
 </pre></div>
 </div>
 <p>Note that this statement actually reads data from a fixed-size file, so a 
better example would involve reading from a FIFO file. Nevertheless, once 
Datafusion sees the <code class="docutils literal notranslate"><span 
class="pre">UNBOUNDED</span></code> keyword in a data source, it tries to 
execute queries that refer to this unbounded source in streaming fashion. If 
this is not possible according to query specifications, plan generation fails 
stating it is not possible to execute given [...]
+</section>
+<section id="example-with-order-clause">
+<h3>Example: <code class="docutils literal notranslate"><span 
class="pre">WITH</span> <span class="pre">ORDER</span></code> Clause<a 
class="headerlink" href="#example-with-order-clause" title="Link to this 
heading">¶</a></h3>
 <p>When creating an output from a data source that is already ordered by
 an expression, you can pre-specify the order of the data using the
 <code class="docutils literal notranslate"><span class="pre">WITH</span> <span 
class="pre">ORDER</span></code> clause. This applies even if the expression 
used for
@@ -760,7 +841,7 @@ sorting is complex, allowing for greater flexibility.</p>
 </pre></div>
 </div>
 <section id="cautions-when-using-the-with-order-clause">
-<h3>Cautions when using the WITH ORDER Clause<a class="headerlink" 
href="#cautions-when-using-the-with-order-clause" title="Link to this 
heading">¶</a></h3>
+<h4>Cautions when using the WITH ORDER Clause<a class="headerlink" 
href="#cautions-when-using-the-with-order-clause" title="Link to this 
heading">¶</a></h4>
 <ul class="simple">
 <li><p>It’s important to understand that using the <code class="docutils 
literal notranslate"><span class="pre">WITH</span> <span 
class="pre">ORDER</span></code> clause in the <code class="docutils literal 
notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> 
<span class="pre">TABLE</span></code> statement only specifies the order in 
which the data should be read from the external file. If the data in the file 
is not already sorted according to the specified orde [...]
 <li><p>It’s also important to note that the <code class="docutils literal 
notranslate"><span class="pre">WITH</span> <span 
class="pre">ORDER</span></code> clause does not affect the ordering of the data 
in the original external file.</p></li>
@@ -779,6 +860,7 @@ sorting is complex, allowing for greater flexibility.</p>
 </div>
 </section>
 </section>
+</section>
 <section id="create-table">
 <h2>CREATE TABLE<a class="headerlink" href="#create-table" title="Link to this 
heading">¶</a></h2>
 <p>An in-memory table can be created with a query or values list.</p>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org

Reply via email to