This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new cf8437a5a6 Publish built docs triggered by 
2d7ae09262f7a1338c30192b33efbe1b2d1d9829
cf8437a5a6 is described below

commit cf8437a5a6d88e03c74626daa5535c6f01d65d0a
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Thu Jun 19 18:10:22 2025 +0000

    Publish built docs triggered by 2d7ae09262f7a1338c30192b33efbe1b2d1d9829
---
 _sources/library-user-guide/upgrading.md.txt | 18 ++++++++++++++--
 _sources/user-guide/configs.md.txt           |  2 +-
 _sources/user-guide/sql/ddl.md.txt           |  8 +++----
 library-user-guide/upgrading.html            | 32 ++++++++++++++++++++++++++--
 searchindex.js                               |  2 +-
 user-guide/configs.html                      |  4 ++--
 user-guide/sql/ddl.html                      |  8 +++----
 7 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/_sources/library-user-guide/upgrading.md.txt 
b/_sources/library-user-guide/upgrading.md.txt
index b502850b59..613c2be43d 100644
--- a/_sources/library-user-guide/upgrading.md.txt
+++ b/_sources/library-user-guide/upgrading.md.txt
@@ -21,6 +21,21 @@
 
 ## DataFusion `49.0.0`
 
+### `datafusion.execution.collect_statistics` now defaults to `true`
+
+The default value of the `datafusion.execution.collect_statistics` 
configuration
+setting is now true. This change impacts users that use that value directly 
and relied
+on its default value being `false`.
+
+This change also restores the default behavior of `ListingTable` to its 
previous. If you use it directly
+you can maintain the current behavior by overriding the default value in your 
code.
+
+```rust
+ListingOptions::new(Arc::new(ParquetFormat::default()))
+    .with_collect_stat(false)
+    // other options
+```
+
 ### Metadata is now represented by `FieldMetadata`
 
 Metadata from the Arrow `Field` is now stored using the `FieldMetadata`
@@ -127,7 +142,7 @@ match expr {
 
 [details on #16207]: 
https://github.com/apache/datafusion/pull/16207#issuecomment-2922659103
 
-### The `VARCHAR` SQL type is now represented as `Utf8View` in Arrow.
+### The `VARCHAR` SQL type is now represented as `Utf8View` in Arrow
 
 The mapping of the SQL `VARCHAR` type has been changed from `Utf8` to 
`Utf8View`
 which improves performance for many string operations. You can read more about
@@ -277,7 +292,6 @@ Additionally `ObjectStore::list` and 
`ObjectStore::list_with_offset` have been c
 
 [#6619]: https://github.com/apache/arrow-rs/pull/6619
 [#7371]: https://github.com/apache/arrow-rs/pull/7371
-[#7328]: https://github.com/apache/arrow-rs/pull/6961
 
 This requires converting from `usize` to `u64` occasionally as well as changes 
to `ObjectStore` implementations such as
 
diff --git a/_sources/user-guide/configs.md.txt 
b/_sources/user-guide/configs.md.txt
index 1b8233a541..b55e63293f 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -47,7 +47,7 @@ Environment variables are read during `SessionConfig` 
initialisation so they mus
 | datafusion.catalog.newlines_in_values                                   | 
false                     | Specifies whether newlines in (quoted) CSV values 
are supported. This is the default value for `format.newlines_in_values` for 
`CREATE EXTERNAL TABLE` if not specified explicitly in the statement. Parsing 
newlines in quoted values may be affected by execution behaviour such as 
parallel file scanning. Setting this to `true` ensures that newlines in values 
are parsed successfully, which  [...]
 | datafusion.execution.batch_size                                         | 
8192                      | Default batch size while creating new batches, it's 
especially useful for buffer-in-memory batches since creating tiny batches 
would result in too much metadata memory consumption                            
                                                                                
                                                                                
                      [...]
 | datafusion.execution.coalesce_batches                                   | 
true                      | When set to true, record batches will be examined 
between each operator and small batches will be coalesced into larger batches. 
This is helpful when there are highly selective filters or joins that could 
produce tiny output batches. The target batch size is determined by the 
configuration setting                                                           
                                [...]
-| datafusion.execution.collect_statistics                                 | 
false                     | Should DataFusion collect statistics when first 
creating a table. Has no effect after the table is created. Applies to the 
default `ListingTableProvider` in DataFusion. Defaults to false.                
                                                                                
                                                                                
                          [...]
+| datafusion.execution.collect_statistics                                 | 
true                      | Should DataFusion collect statistics when first 
creating a table. Has no effect after the table is created. Applies to the 
default `ListingTableProvider` in DataFusion. Defaults to true.                 
                                                                                
                                                                                
                          [...]
 | datafusion.execution.target_partitions                                  | 0  
                       | Number of partitions for query execution. Increasing 
partitions can increase concurrency. Defaults to the number of CPU cores on the 
system                                                                          
                                                                                
                                                                                
                [...]
 | datafusion.execution.time_zone                                          | 
+00:00                    | The default time zone Some functions, e.g. 
`EXTRACT(HOUR from SOME_TIME)`, shift the underlying datetime according to this 
time zone, and then extract the hour                                            
                                                                                
                                                                                
                          [...]
 | datafusion.execution.parquet.enable_page_index                          | 
true                      | (reading) If true, reads the Parquet data page 
level metadata (the Page Index), if present, to reduce the I/O and number of 
rows decoded.                                                                   
                                                                                
                                                                                
                         [...]
diff --git a/_sources/user-guide/sql/ddl.md.txt 
b/_sources/user-guide/sql/ddl.md.txt
index ff8fa9bac0..1d971594ad 100644
--- a/_sources/user-guide/sql/ddl.md.txt
+++ b/_sources/user-guide/sql/ddl.md.txt
@@ -95,14 +95,14 @@ LOCATION '/mnt/nyctaxi/tripdata.parquet';
 
 :::{note}
 Statistics
-: By default, when a table is created, DataFusion will _NOT_ read the files
+: By default, when a table is created, DataFusion will read the files
 to gather statistics, which can be expensive but can accelerate subsequent
-queries substantially. If you want to gather statistics
+queries substantially. If you don't want to gather statistics
 when creating a table, set the `datafusion.execution.collect_statistics`
-configuration option to `true` before creating the table. For example:
+configuration option to `false` before creating the table. For example:
 
 ```sql
-SET datafusion.execution.collect_statistics = true;
+SET datafusion.execution.collect_statistics = false;
 ```
 
 See the [config settings docs](../configs.md) for more details.
diff --git a/library-user-guide/upgrading.html 
b/library-user-guide/upgrading.html
index 734eec41b4..baa1e14504 100644
--- a/library-user-guide/upgrading.html
+++ b/library-user-guide/upgrading.html
@@ -559,6 +559,21 @@
    </code>
   </a>
   <ul class="nav section-nav flex-column">
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" 
href="#datafusion-execution-collect-statistics-now-defaults-to-true">
+     <code class="docutils literal notranslate">
+      <span class="pre">
+       datafusion.execution.collect_statistics
+      </span>
+     </code>
+     now defaults to
+     <code class="docutils literal notranslate">
+      <span class="pre">
+       true
+      </span>
+     </code>
+    </a>
+   </li>
    <li class="toc-h3 nav-item toc-entry">
     <a class="reference internal nav-link" 
href="#metadata-is-now-represented-by-fieldmetadata">
      Metadata is now represented by
@@ -621,7 +636,7 @@
        Utf8View
       </span>
      </code>
-     in Arrow.
+     in Arrow
     </a>
    </li>
    <li class="toc-h3 nav-item toc-entry">
@@ -950,6 +965,19 @@
 <h1>Upgrade Guides<a class="headerlink" href="#upgrade-guides" title="Link to 
this heading">¶</a></h1>
 <section id="datafusion-49-0-0">
 <h2>DataFusion <code class="docutils literal notranslate"><span 
class="pre">49.0.0</span></code><a class="headerlink" href="#datafusion-49-0-0" 
title="Link to this heading">¶</a></h2>
+<section id="datafusion-execution-collect-statistics-now-defaults-to-true">
+<h3><code class="docutils literal notranslate"><span 
class="pre">datafusion.execution.collect_statistics</span></code> now defaults 
to <code class="docutils literal notranslate"><span 
class="pre">true</span></code><a class="headerlink" 
href="#datafusion-execution-collect-statistics-now-defaults-to-true" 
title="Link to this heading">¶</a></h3>
+<p>The default value of the <code class="docutils literal notranslate"><span 
class="pre">datafusion.execution.collect_statistics</span></code> configuration
+setting is now true. This change impacts users that use that value directly 
and relied
+on its default value being <code class="docutils literal notranslate"><span 
class="pre">false</span></code>.</p>
+<p>This change also restores the default behavior of <code class="docutils 
literal notranslate"><span class="pre">ListingTable</span></code> to its 
previous. If you use it directly
+you can maintain the current behavior by overriding the default value in your 
code.</p>
+<div class="highlight-rust notranslate"><div 
class="highlight"><pre><span></span><span class="n">ListingOptions</span><span 
class="p">::</span><span class="n">new</span><span class="p">(</span><span 
class="n">Arc</span><span class="p">::</span><span class="n">new</span><span 
class="p">(</span><span class="n">ParquetFormat</span><span 
class="p">::</span><span class="n">default</span><span class="p">()))</span>
+<span class="w">    </span><span class="p">.</span><span 
class="n">with_collect_stat</span><span class="p">(</span><span 
class="kc">false</span><span class="p">)</span>
+<span class="w">    </span><span class="c1">// other options</span>
+</pre></div>
+</div>
+</section>
 <section id="metadata-is-now-represented-by-fieldmetadata">
 <h3>Metadata is now represented by <code class="docutils literal 
notranslate"><span class="pre">FieldMetadata</span></code><a class="headerlink" 
href="#metadata-is-now-represented-by-fieldmetadata" title="Link to this 
heading">¶</a></h3>
 <p>Metadata from the Arrow <code class="docutils literal notranslate"><span 
class="pre">Field</span></code> is now stored using the <code class="docutils 
literal notranslate"><span class="pre">FieldMetadata</span></code>
@@ -1037,7 +1065,7 @@ on <code class="docutils literal notranslate"><span 
class="pre">Expr::WindowFunc
 </div>
 </section>
 <section id="the-varchar-sql-type-is-now-represented-as-utf8view-in-arrow">
-<h3>The <code class="docutils literal notranslate"><span 
class="pre">VARCHAR</span></code> SQL type is now represented as <code 
class="docutils literal notranslate"><span class="pre">Utf8View</span></code> 
in Arrow.<a class="headerlink" 
href="#the-varchar-sql-type-is-now-represented-as-utf8view-in-arrow" 
title="Link to this heading">¶</a></h3>
+<h3>The <code class="docutils literal notranslate"><span 
class="pre">VARCHAR</span></code> SQL type is now represented as <code 
class="docutils literal notranslate"><span class="pre">Utf8View</span></code> 
in Arrow<a class="headerlink" 
href="#the-varchar-sql-type-is-now-represented-as-utf8view-in-arrow" 
title="Link to this heading">¶</a></h3>
 <p>The mapping of the SQL <code class="docutils literal notranslate"><span 
class="pre">VARCHAR</span></code> type has been changed from <code 
class="docutils literal notranslate"><span class="pre">Utf8</span></code> to 
<code class="docutils literal notranslate"><span 
class="pre">Utf8View</span></code>
 which improves performance for many string operations. You can read more about
 <code class="docutils literal notranslate"><span 
class="pre">Utf8View</span></code> in the <a class="reference external" 
href="https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1/";>DataFusion
 blog post on German-style strings</a></p>
diff --git a/searchindex.js b/searchindex.js
index 089f216068..bf091943de 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles":{"!=":[[57,"op-neq"]],"!~":[[57,"op-re-not-match"]],"!~*":[[57,"op-re-not-match-i"]],"!~~":[[57,"id19"]],"!~~*":[[57,"id20"]],"#":[[57,"op-bit-xor"]],"%":[[57,"op-modulo"]],"&":[[57,"op-bit-and"]],"(relation,
 name) tuples in logical fields and logical columns are 
unique":[[12,"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]],"*":[[57,"op-multiply"]],"+":[[57,"op-plus"]],"-":[[57,"op-minus"]],"/":[[57,"op-divide"]],"<":[[57,"op-lt"]],"<
 [...]
\ No newline at end of file
+Search.setIndex({"alltitles":{"!=":[[57,"op-neq"]],"!~":[[57,"op-re-not-match"]],"!~*":[[57,"op-re-not-match-i"]],"!~~":[[57,"id19"]],"!~~*":[[57,"id20"]],"#":[[57,"op-bit-xor"]],"%":[[57,"op-modulo"]],"&":[[57,"op-bit-and"]],"(relation,
 name) tuples in logical fields and logical columns are 
unique":[[12,"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]],"*":[[57,"op-multiply"]],"+":[[57,"op-plus"]],"-":[[57,"op-minus"]],"/":[[57,"op-divide"]],"<":[[57,"op-lt"]],"<
 [...]
\ No newline at end of file
diff --git a/user-guide/configs.html b/user-guide/configs.html
index df9a7adab7..5e14371193 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -654,8 +654,8 @@ Environment variables are read during <code class="docutils 
literal notranslate"
 <td><p>When set to true, record batches will be examined between each operator 
and small batches will be coalesced into larger batches. This is helpful when 
there are highly selective filters or joins that could produce tiny output 
batches. The target batch size is determined by the configuration 
setting</p></td>
 </tr>
 <tr class="row-even"><td><p>datafusion.execution.collect_statistics</p></td>
-<td><p>false</p></td>
-<td><p>Should DataFusion collect statistics when first creating a table. Has 
no effect after the table is created. Applies to the default <code 
class="docutils literal notranslate"><span 
class="pre">ListingTableProvider</span></code> in DataFusion. Defaults to 
false.</p></td>
+<td><p>true</p></td>
+<td><p>Should DataFusion collect statistics when first creating a table. Has 
no effect after the table is created. Applies to the default <code 
class="docutils literal notranslate"><span 
class="pre">ListingTableProvider</span></code> in DataFusion. Defaults to 
true.</p></td>
 </tr>
 <tr class="row-odd"><td><p>datafusion.execution.target_partitions</p></td>
 <td><p>0</p></td>
diff --git a/user-guide/sql/ddl.html b/user-guide/sql/ddl.html
index d22a883372..a87ad924d2 100644
--- a/user-guide/sql/ddl.html
+++ b/user-guide/sql/ddl.html
@@ -749,14 +749,14 @@ provide schema information for Parquet files.</p>
 <div class="admonition note">
 <p class="admonition-title">Note</p>
 <dl class="simple myst">
-<dt>Statistics</dt><dd><p>By default, when a table is created, DataFusion will 
<em>NOT</em> read the files
+<dt>Statistics</dt><dd><p>By default, when a table is created, DataFusion will 
read the files
 to gather statistics, which can be expensive but can accelerate subsequent
-queries substantially. If you want to gather statistics
+queries substantially. If you don’t want to gather statistics
 when creating a table, set the <code class="docutils literal 
notranslate"><span 
class="pre">datafusion.execution.collect_statistics</span></code>
-configuration option to <code class="docutils literal notranslate"><span 
class="pre">true</span></code> before creating the table. For example:</p>
+configuration option to <code class="docutils literal notranslate"><span 
class="pre">false</span></code> before creating the table. For example:</p>
 </dd>
 </dl>
-<div class="highlight-sql notranslate"><div 
class="highlight"><pre><span></span><span class="k">SET</span><span class="w"> 
</span><span class="n">datafusion</span><span class="p">.</span><span 
class="n">execution</span><span class="p">.</span><span 
class="n">collect_statistics</span><span class="w"> </span><span 
class="o">=</span><span class="w"> </span><span class="k">true</span><span 
class="p">;</span>
+<div class="highlight-sql notranslate"><div 
class="highlight"><pre><span></span><span class="k">SET</span><span class="w"> 
</span><span class="n">datafusion</span><span class="p">.</span><span 
class="n">execution</span><span class="p">.</span><span 
class="n">collect_statistics</span><span class="w"> </span><span 
class="o">=</span><span class="w"> </span><span class="k">false</span><span 
class="p">;</span>
 </pre></div>
 </div>
 <p>See the <a class="reference internal" href="../configs.html"><span 
class="std std-doc">config settings docs</span></a> for more details.</p>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org

Reply via email to