(datafusion) branch asf-site updated: Publish built docs triggered by 26106a2aa8cd49802a0178bc2a9ac16b93524456

github-bot Tue, 09 Sep 2025 05:52:57 -0700

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 61105987b9 Publish built docs triggered by 
26106a2aa8cd49802a0178bc2a9ac16b93524456
61105987b9 is described below

commit 61105987b9698eac5437ad82d408f10494f29390
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Tue Sep 9 12:52:40 2025 +0000

    Publish built docs triggered by 26106a2aa8cd49802a0178bc2a9ac16b93524456
---
 _sources/library-user-guide/upgrading.md.txt |  11 +++
 _sources/user-guide/cli/datasources.md.txt   |  24 ++++++
 _sources/user-guide/configs.md.txt           |   1 +
 _sources/user-guide/sql/ddl.md.txt           |  29 +++++++
 library-user-guide/upgrading.html            |  14 ++++
 searchindex.js                               |   2 +-
 user-guide/cli/datasources.html              |  20 +++++
 user-guide/configs.html                      | 116 ++++++++++++++-------------
 user-guide/sql/ddl.html                      |  25 ++++++
 9 files changed, 185 insertions(+), 57 deletions(-)

diff --git a/_sources/library-user-guide/upgrading.md.txt 
b/_sources/library-user-guide/upgrading.md.txt
index b9fbc6e1be..6c8eb0ec1e 100644
--- a/_sources/library-user-guide/upgrading.md.txt
+++ b/_sources/library-user-guide/upgrading.md.txt
@@ -24,6 +24,17 @@
 **Note:** DataFusion `50.0.0` has not been released yet. The information 
provided in this section pertains to features and changes that have already 
been merged to the main branch and are awaiting release in this version.
 You can see the current [status of the `50.0.0 `release 
here](https://github.com/apache/datafusion/issues/16799)
 
+### ListingTable automatically detects Hive Partitioned tables
+
+DataFusion 50.0.0 automatically infers Hive partitions when using the 
`ListingTableFactory` and `CREATE EXTERNAL TABLE`. Previously,
+when creating a `ListingTable`, datasets that use Hive partitioning (e.g.
+`/table_root/column1=value1/column2=value2/data.parquet`) would not have the 
Hive columns reflected in
+the table's schema or data. The previous behavior can be
+restored by setting the 
`datafusion.execution.listing_table_factory_infer_partitions` configuration 
option to `false`.
+See [issue #17049] for more details.
+
+[issue #17049]: https://github.com/apache/datafusion/issues/17049
+
 ### `MSRV` updated to 1.86.0
 
 The Minimum Supported Rust Version (MSRV) has been updated to [`1.86.0`].
diff --git a/_sources/user-guide/cli/datasources.md.txt 
b/_sources/user-guide/cli/datasources.md.txt
index c15b8a5e46..6b1a4887a8 100644
--- a/_sources/user-guide/cli/datasources.md.txt
+++ b/_sources/user-guide/cli/datasources.md.txt
@@ -162,6 +162,30 @@ STORED AS PARQUET
 LOCATION 'gs://bucket/my_table/';
 ```
 
+When specifying a directory path that has a Hive compliant partition 
structure, by default, DataFusion CLI will
+automatically parse and incorporate the Hive columns and their values into the 
table's schema and data. Given the
+following remote object paths:
+
+```console
+gs://bucket/my_table/a=1/b=100/file1.parquet
+gs://bucket/my_table/a=2/b=200/file2.parquet
+```
+
+`my_table` can be queried and filtered on the Hive columns:
+
+```sql
+CREATE EXTERNAL TABLE my_table
+STORED AS PARQUET
+LOCATION 'gs://bucket/my_table/';
+
+SELECT count(*) FROM my_table WHERE b=200;
++----------+
+| count(*) |
++----------+
+| 1        |
++----------+
+```
+
 # Formats
 
 ## Parquet
diff --git a/_sources/user-guide/configs.md.txt 
b/_sources/user-guide/configs.md.txt
index 3d4730958f..f00c4c2acc 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -118,6 +118,7 @@ The following configuration settings are available:
 | datafusion.execution.soft_max_rows_per_output_file                      | 
50000000                  | Target number of rows in output files when writing 
multiple. This is a soft max, so it can be exceeded slightly. There also will 
be one file smaller than the limit if the total number of rows written is not 
roughly divisible by the soft max                                               
                                                                                
                      [...]
 | datafusion.execution.max_buffered_batches_per_output_file               | 2  
                       | This is the maximum number of RecordBatches buffered 
for each output file being worked. Higher values can potentially give faster 
write performance at the cost of higher peak memory consumption                 
                                                                                
                                                                                
                   [...]
 | datafusion.execution.listing_table_ignore_subdirectory                  | 
true                      | Should sub directories be ignored when scanning 
directories for data files. Defaults to true (ignores subdirectories), 
consistent with Hive. Note that this setting does not affect reading 
partitioned tables (e.g. `/table/year=2021/month=01/data.parquet`).             
                                                                                
                                         [...]
+| datafusion.execution.listing_table_factory_infer_partitions             | 
true                      | Should a `ListingTable` created through the 
`ListingTableFactory` infer table partitions from Hive compliant directories. 
Defaults to true (partition columns are inferred and will be represented in the 
table schema).                                                                  
                                                                                
                           [...]
 | datafusion.execution.enable_recursive_ctes                              | 
true                      | Should DataFusion support recursive CTEs            
                                                                                
                                                                                
                                                                                
                                                                                
                 [...]
 | datafusion.execution.split_file_groups_by_statistics                    | 
false                     | Attempt to eliminate sorts by packing & sorting 
files with non-overlapping statistics into the same file groups. Currently 
experimental                                                                    
                                                                                
                                                                                
                          [...]
 | datafusion.execution.keep_partition_by_columns                          | 
false                     | Should DataFusion keep the columns used for 
partition_by in the output RecordBatches                                        
                                                                                
                                                                                
                                                                                
                         [...]
diff --git a/_sources/user-guide/sql/ddl.md.txt 
b/_sources/user-guide/sql/ddl.md.txt
index b5028cca49..bd41f691bf 100644
--- a/_sources/user-guide/sql/ddl.md.txt
+++ b/_sources/user-guide/sql/ddl.md.txt
@@ -169,6 +169,35 @@ LOCATION '/path/to/directory/of/files'
 OPTIONS ('has_header' 'true');
 ```
 
+Tables that are partitioned using a Hive compliant partitioning scheme will 
have their columns and values automatically
+detected and incorporated into the table's schema and data. Given the 
following example directory structure:
+
+```console
+hive_partitioned/
+├── a=1
+│   └── b=200
+│       └── file1.parquet
+└── a=2
+    └── b=100
+        └── file2.parquet
+```
+
+Users can specify the top level `hive_partitioned` directory as an `EXTERNAL 
TABLE` and leverage the Hive partitions to query
+and filter data.
+
+```sql
+CREATE EXTERNAL TABLE hive_partitioned
+STORED AS PARQUET
+LOCATION '/path/to/hive_partitioned/';
+
+SELECT count(*) FROM hive_partitioned WHERE b=100;
++------------------+
+| count(*)         |
++------------------+
+| 1                |
++------------------+
+```
+
 ### Example: Unbounded Data Sources
 
 We can create unbounded data sources using the `CREATE UNBOUNDED EXTERNAL 
TABLE` SQL statement.
diff --git a/library-user-guide/upgrading.html 
b/library-user-guide/upgrading.html
index abd7dfd257..642e219edb 100644
--- a/library-user-guide/upgrading.html
+++ b/library-user-guide/upgrading.html
@@ -559,6 +559,11 @@
    </code>
   </a>
   <ul class="nav section-nav flex-column">
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" 
href="#listingtable-automatically-detects-hive-partitioned-tables">
+     ListingTable automatically detects Hive Partitioned tables
+    </a>
+   </li>
    <li class="toc-h3 nav-item toc-entry">
     <a class="reference internal nav-link" href="#msrv-updated-to-1-86-0">
      <code class="docutils literal notranslate">
@@ -1299,6 +1304,15 @@
 <h2>DataFusion <code class="docutils literal notranslate"><span 
class="pre">50.0.0</span></code><a class="headerlink" href="#datafusion-50-0-0" 
title="Link to this heading">¶</a></h2>
 <p><strong>Note:</strong> DataFusion <code class="docutils literal 
notranslate"><span class="pre">50.0.0</span></code> has not been released yet. 
The information provided in this section pertains to features and changes that 
have already been merged to the main branch and are awaiting release in this 
version.
 You can see the current <a class="reference external" 
href="https://github.com/apache/datafusion/issues/16799";>status of the <code 
class="docutils literal notranslate"><span class="pre">50.0.0</span> 
</code>release here</a></p>
+<section id="listingtable-automatically-detects-hive-partitioned-tables">
+<h3>ListingTable automatically detects Hive Partitioned tables<a 
class="headerlink" 
href="#listingtable-automatically-detects-hive-partitioned-tables" title="Link 
to this heading">¶</a></h3>
+<p>DataFusion 50.0.0 automatically infers Hive partitions when using the <code 
class="docutils literal notranslate"><span 
class="pre">ListingTableFactory</span></code> and <code class="docutils literal 
notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> 
<span class="pre">TABLE</span></code>. Previously,
+when creating a <code class="docutils literal notranslate"><span 
class="pre">ListingTable</span></code>, datasets that use Hive partitioning 
(e.g.
+<code class="docutils literal notranslate"><span 
class="pre">/table_root/column1=value1/column2=value2/data.parquet</span></code>)
 would not have the Hive columns reflected in
+the table’s schema or data. The previous behavior can be
+restored by setting the <code class="docutils literal notranslate"><span 
class="pre">datafusion.execution.listing_table_factory_infer_partitions</span></code>
 configuration option to <code class="docutils literal notranslate"><span 
class="pre">false</span></code>.
+See <a class="reference external" 
href="https://github.com/apache/datafusion/issues/17049";>issue #17049</a> for 
more details.</p>
+</section>
 <section id="msrv-updated-to-1-86-0">
 <h3><code class="docutils literal notranslate"><span 
class="pre">MSRV</span></code> updated to 1.86.0<a class="headerlink" 
href="#msrv-updated-to-1-86-0" title="Link to this heading">¶</a></h3>
 <p>The Minimum Supported Rust Version (MSRV) has been updated to <a 
class="reference external" href="https://releases.rs/docs/1.86.0/";><code 
class="docutils literal notranslate"><span class="pre">1.86.0</span></code></a>.
diff --git a/searchindex.js b/searchindex.js
index 4601dbfba5..e0ac9f0117 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles":{"!=":[[57,"op-neq"]],"!~":[[57,"op-re-not-match"]],"!~*":[[57,"op-re-not-match-i"]],"!~~":[[57,"id19"]],"!~~*":[[57,"id20"]],"#":[[57,"op-bit-xor"]],"%":[[57,"op-modulo"]],"&":[[57,"op-bit-and"]],"(relation,
 name) tuples in logical fields and logical columns are 
unique":[[12,"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]],"*":[[57,"op-multiply"]],"+":[[57,"op-plus"]],"-":[[57,"op-minus"]],"/":[[57,"op-divide"]],"<":[[57,"op-lt"]],"<
 [...]
\ No newline at end of file
+Search.setIndex({"alltitles":{"!=":[[57,"op-neq"]],"!~":[[57,"op-re-not-match"]],"!~*":[[57,"op-re-not-match-i"]],"!~~":[[57,"id19"]],"!~~*":[[57,"id20"]],"#":[[57,"op-bit-xor"]],"%":[[57,"op-modulo"]],"&":[[57,"op-bit-and"]],"(relation,
 name) tuples in logical fields and logical columns are 
unique":[[12,"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]],"*":[[57,"op-multiply"]],"+":[[57,"op-plus"]],"-":[[57,"op-minus"]],"/":[[57,"op-divide"]],"<":[[57,"op-lt"]],"<
 [...]
\ No newline at end of file
diff --git a/user-guide/cli/datasources.html b/user-guide/cli/datasources.html
index 2587da596d..c74abfd37f 100644
--- a/user-guide/cli/datasources.html
+++ b/user-guide/cli/datasources.html
@@ -797,6 +797,26 @@ path, and it will automatically read all compatible files 
within that directory.
 <span class="k">LOCATION</span><span class="w"> </span><span 
class="s1">&#39;gs://bucket/my_table/&#39;</span><span class="p">;</span>
 </pre></div>
 </div>
+<p>When specifying a directory path that has a Hive compliant partition 
structure, by default, DataFusion CLI will
+automatically parse and incorporate the Hive columns and their values into the 
table’s schema and data. Given the
+following remote object paths:</p>
+<div class="highlight-console notranslate"><div 
class="highlight"><pre><span></span><span 
class="go">gs://bucket/my_table/a=1/b=100/file1.parquet</span>
+<span class="go">gs://bucket/my_table/a=2/b=200/file2.parquet</span>
+</pre></div>
+</div>
+<p><code class="docutils literal notranslate"><span 
class="pre">my_table</span></code> can be queried and filtered on the Hive 
columns:</p>
+<div class="highlight-sql notranslate"><div 
class="highlight"><pre><span></span><span class="k">CREATE</span><span 
class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span 
class="k">TABLE</span><span class="w"> </span><span class="n">my_table</span>
+<span class="n">STORED</span><span class="w"> </span><span 
class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span>
+<span class="k">LOCATION</span><span class="w"> </span><span 
class="s1">&#39;gs://bucket/my_table/&#39;</span><span class="p">;</span>
+
+<span class="k">SELECT</span><span class="w"> </span><span 
class="k">count</span><span class="p">(</span><span class="o">*</span><span 
class="p">)</span><span class="w"> </span><span class="k">FROM</span><span 
class="w"> </span><span class="n">my_table</span><span class="w"> </span><span 
class="k">WHERE</span><span class="w"> </span><span class="n">b</span><span 
class="o">=</span><span class="mi">200</span><span class="p">;</span>
+<span class="o">+</span><span class="c1">----------+</span>
+<span class="o">|</span><span class="w"> </span><span 
class="k">count</span><span class="p">(</span><span class="o">*</span><span 
class="p">)</span><span class="w"> </span><span class="o">|</span>
+<span class="o">+</span><span class="c1">----------+</span>
+<span class="o">|</span><span class="w"> </span><span class="mi">1</span><span 
class="w">        </span><span class="o">|</span>
+<span class="o">+</span><span class="c1">----------+</span>
+</pre></div>
+</div>
 </section>
 <section id="formats">
 <h1>Formats<a class="headerlink" href="#formats" title="Link to this 
heading">¶</a></h1>
diff --git a/user-guide/configs.html b/user-guide/configs.html
index d9b6cd4d05..97dd0b04c5 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -902,227 +902,231 @@ example, to configure <code class="docutils literal 
notranslate"><span class="pr
 <td><p>true</p></td>
 <td><p>Should sub directories be ignored when scanning directories for data 
files. Defaults to true (ignores subdirectories), consistent with Hive. Note 
that this setting does not affect reading partitioned tables (e.g. <code 
class="docutils literal notranslate"><span 
class="pre">/table/year=2021/month=01/data.parquet</span></code>).</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.execution.enable_recursive_ctes</p></td>
+<tr 
class="row-odd"><td><p>datafusion.execution.listing_table_factory_infer_partitions</p></td>
+<td><p>true</p></td>
+<td><p>Should a <code class="docutils literal notranslate"><span 
class="pre">ListingTable</span></code> created through the <code 
class="docutils literal notranslate"><span 
class="pre">ListingTableFactory</span></code> infer table partitions from Hive 
compliant directories. Defaults to true (partition columns are inferred and 
will be represented in the table schema).</p></td>
+</tr>
+<tr class="row-even"><td><p>datafusion.execution.enable_recursive_ctes</p></td>
 <td><p>true</p></td>
 <td><p>Should DataFusion support recursive CTEs</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.execution.split_file_groups_by_statistics</p></td>
+<tr 
class="row-odd"><td><p>datafusion.execution.split_file_groups_by_statistics</p></td>
 <td><p>false</p></td>
 <td><p>Attempt to eliminate sorts by packing &amp; sorting files with 
non-overlapping statistics into the same file groups. Currently 
experimental</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.execution.keep_partition_by_columns</p></td>
+<tr 
class="row-even"><td><p>datafusion.execution.keep_partition_by_columns</p></td>
 <td><p>false</p></td>
 <td><p>Should DataFusion keep the columns used for partition_by in the output 
RecordBatches</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.execution.skip_partial_aggregation_probe_ratio_threshold</p></td>
+<tr 
class="row-odd"><td><p>datafusion.execution.skip_partial_aggregation_probe_ratio_threshold</p></td>
 <td><p>0.8</p></td>
 <td><p>Aggregation ratio (number of distinct groups / number of input rows) 
threshold for skipping partial aggregation. If the value is greater then 
partial aggregation will skip aggregation for further input</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.execution.skip_partial_aggregation_probe_rows_threshold</p></td>
+<tr 
class="row-even"><td><p>datafusion.execution.skip_partial_aggregation_probe_rows_threshold</p></td>
 <td><p>100000</p></td>
 <td><p>Number of input rows partial aggregation partition should process, 
before aggregation ratio check and trying to switch to skipping aggregation 
mode</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.execution.use_row_number_estimates_to_optimize_partitioning</p></td>
+<tr 
class="row-odd"><td><p>datafusion.execution.use_row_number_estimates_to_optimize_partitioning</p></td>
 <td><p>false</p></td>
 <td><p>Should DataFusion use row number estimates at the input to decide 
whether increasing parallelism is beneficial or not. By default, only exact row 
numbers (not estimates) are used for this decision. Setting this flag to <code 
class="docutils literal notranslate"><span class="pre">true</span></code> will 
likely produce better plans. if the source of statistics is accurate. We plan 
to make this the default in the future.</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.execution.enforce_batch_size_in_joins</p></td>
+<tr 
class="row-even"><td><p>datafusion.execution.enforce_batch_size_in_joins</p></td>
 <td><p>false</p></td>
 <td><p>Should DataFusion enforce batch size in joins or not. By default, 
DataFusion will not enforce batch size in joins. Enforcing batch size in joins 
can reduce memory usage when joining large tables with a highly-selective join 
filter, but is also slightly slower.</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.execution.objectstore_writer_buffer_size</p></td>
+<tr 
class="row-odd"><td><p>datafusion.execution.objectstore_writer_buffer_size</p></td>
 <td><p>10485760</p></td>
 <td><p>Size (bytes) of data buffer DataFusion uses when writing output files. 
This affects the size of the data chunks that are uploaded to remote object 
stores (e.g. AWS S3). If very large (&gt;= 100 GiB) output files are being 
written, it may be necessary to increase this size to avoid errors from the 
remote end point.</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.optimizer.enable_distinct_aggregation_soft_limit</p></td>
+<tr 
class="row-even"><td><p>datafusion.optimizer.enable_distinct_aggregation_soft_limit</p></td>
 <td><p>true</p></td>
 <td><p>When set to true, the optimizer will push a limit operation into 
grouped aggregations which have no aggregate expressions, as a soft limit, 
emitting groups once the limit is reached, before all rows in the group are 
read.</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.optimizer.enable_round_robin_repartition</p></td>
+<tr 
class="row-odd"><td><p>datafusion.optimizer.enable_round_robin_repartition</p></td>
 <td><p>true</p></td>
 <td><p>When set to true, the physical plan optimizer will try to add round 
robin repartitioning to increase parallelism to leverage more CPU cores</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.optimizer.enable_topk_aggregation</p></td>
+<tr 
class="row-even"><td><p>datafusion.optimizer.enable_topk_aggregation</p></td>
 <td><p>true</p></td>
 <td><p>When set to true, the optimizer will attempt to perform limit 
operations during aggregations, if possible</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.optimizer.enable_window_limits</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.enable_window_limits</p></td>
 <td><p>true</p></td>
 <td><p>When set to true, the optimizer will attempt to push limit operations 
past window functions, if possible</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.optimizer.enable_dynamic_filter_pushdown</p></td>
+<tr 
class="row-even"><td><p>datafusion.optimizer.enable_dynamic_filter_pushdown</p></td>
 <td><p>true</p></td>
 <td><p>When set to true attempts to push down dynamic filters generated by 
operators into the file scan phase. For example, for a query such as <code 
class="docutils literal notranslate"><span class="pre">SELECT</span> <span 
class="pre">*</span> <span class="pre">FROM</span> <span class="pre">t</span> 
<span class="pre">ORDER</span> <span class="pre">BY</span> <span 
class="pre">timestamp</span> <span class="pre">DESC</span> <span 
class="pre">LIMIT</span> <span class="pre">10</span></code> [...]
 </tr>
-<tr class="row-even"><td><p>datafusion.optimizer.filter_null_join_keys</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.filter_null_join_keys</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, the optimizer will insert filters before a join 
between a nullable and non-nullable column to filter out nulls on the nullable 
side. This filter can add additional overhead when the file format does not 
fully support predicate push down.</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.optimizer.repartition_aggregations</p></td>
+<tr 
class="row-even"><td><p>datafusion.optimizer.repartition_aggregations</p></td>
 <td><p>true</p></td>
 <td><p>Should DataFusion repartition data using the aggregate keys to execute 
aggregates in parallel using the provided <code class="docutils literal 
notranslate"><span class="pre">target_partitions</span></code> level</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.optimizer.repartition_file_min_size</p></td>
+<tr 
class="row-odd"><td><p>datafusion.optimizer.repartition_file_min_size</p></td>
 <td><p>10485760</p></td>
 <td><p>Minimum total files size in bytes to perform file scan 
repartitioning.</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.repartition_joins</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.repartition_joins</p></td>
 <td><p>true</p></td>
 <td><p>Should DataFusion repartition data using the join keys to execute joins 
in parallel using the provided <code class="docutils literal notranslate"><span 
class="pre">target_partitions</span></code> level</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.optimizer.allow_symmetric_joins_without_pruning</p></td>
+<tr 
class="row-odd"><td><p>datafusion.optimizer.allow_symmetric_joins_without_pruning</p></td>
 <td><p>true</p></td>
 <td><p>Should DataFusion allow symmetric hash joins for unbounded data sources 
even when its inputs do not have any ordering or filtering If the flag is not 
enabled, the SymmetricHashJoin operator will be unable to prune its internal 
buffers, resulting in certain join types - such as Full, Left, LeftAnti, 
LeftSemi, Right, RightAnti, and RightSemi - being produced only at the end of 
the execution. This is not typical in stream processing. Additionally, without 
proper design for long runne [...]
 </tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.repartition_file_scans</p></td>
+<tr 
class="row-even"><td><p>datafusion.optimizer.repartition_file_scans</p></td>
 <td><p>true</p></td>
 <td><p>When set to <code class="docutils literal notranslate"><span 
class="pre">true</span></code>, datasource partitions will be repartitioned to 
achieve maximum parallelism. This applies to both in-memory partitions and 
FileSource’s file groups (1 group is 1 partition). For FileSources, only 
Parquet and CSV formats are currently supported. If set to <code 
class="docutils literal notranslate"><span class="pre">true</span></code> for a 
FileSource, all files will be repartitioned evenly ( [...]
 </tr>
-<tr class="row-even"><td><p>datafusion.optimizer.repartition_windows</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.repartition_windows</p></td>
 <td><p>true</p></td>
 <td><p>Should DataFusion repartition data using the partitions keys to execute 
window functions in parallel using the provided <code class="docutils literal 
notranslate"><span class="pre">target_partitions</span></code> level</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.repartition_sorts</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.repartition_sorts</p></td>
 <td><p>true</p></td>
 <td><p>Should DataFusion execute sorts in a per-partition fashion and merge 
afterwards instead of coalescing first and sorting globally. With this flag is 
enabled, plans in the form below <code class="docutils literal 
notranslate"><span class="pre">text</span> <span 
class="pre">&quot;SortExec:</span> <span class="pre">[a&#64;0</span> <span 
class="pre">ASC]&quot;,</span> <span class="pre">&quot;</span> <span 
class="pre">CoalescePartitionsExec&quot;,</span> <span 
class="pre">&quot;</span>  [...]
 </tr>
-<tr class="row-even"><td><p>datafusion.optimizer.prefer_existing_sort</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.prefer_existing_sort</p></td>
 <td><p>false</p></td>
 <td><p>When true, DataFusion will opportunistically remove sorts when the data 
is already sorted, (i.e. setting <code class="docutils literal 
notranslate"><span class="pre">preserve_order</span></code> to true on <code 
class="docutils literal notranslate"><span 
class="pre">RepartitionExec</span></code> and using <code class="docutils 
literal notranslate"><span class="pre">SortPreservingMergeExec</span></code>) 
When false, DataFusion will maximize plan parallelism using <code class="docut 
[...]
 </tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.skip_failed_rules</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.skip_failed_rules</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, the logical plan optimizer will produce warning 
messages if any optimization rules produce errors and then proceed to the next 
rule. When set to false, any rules that produce errors will cause the query to 
fail</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.optimizer.max_passes</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.max_passes</p></td>
 <td><p>3</p></td>
 <td><p>Number of times that the optimizer will attempt to optimize the 
plan</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.optimizer.top_down_join_key_reordering</p></td>
+<tr 
class="row-even"><td><p>datafusion.optimizer.top_down_join_key_reordering</p></td>
 <td><p>true</p></td>
 <td><p>When set to true, the physical plan optimizer will run a top down 
process to reorder the join keys</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.optimizer.prefer_hash_join</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.prefer_hash_join</p></td>
 <td><p>true</p></td>
 <td><p>When set to true, the physical plan optimizer will prefer HashJoin over 
SortMergeJoin. HashJoin can work more efficiently than SortMergeJoin but 
consumes more memory</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.optimizer.hash_join_single_partition_threshold</p></td>
+<tr 
class="row-even"><td><p>datafusion.optimizer.hash_join_single_partition_threshold</p></td>
 <td><p>1048576</p></td>
 <td><p>The maximum estimated size in bytes for one input side of a HashJoin 
will be collected into a single partition</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.optimizer.hash_join_single_partition_threshold_rows</p></td>
+<tr 
class="row-odd"><td><p>datafusion.optimizer.hash_join_single_partition_threshold_rows</p></td>
 <td><p>131072</p></td>
 <td><p>The maximum estimated size in rows for one input side of a HashJoin 
will be collected into a single partition</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.optimizer.default_filter_selectivity</p></td>
+<tr 
class="row-even"><td><p>datafusion.optimizer.default_filter_selectivity</p></td>
 <td><p>20</p></td>
 <td><p>The default filter selectivity used by Filter Statistics when an exact 
selectivity cannot be determined. Valid values are between 0 (no selectivity) 
and 100 (all rows are selected).</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.optimizer.prefer_existing_union</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.prefer_existing_union</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, the optimizer will not attempt to convert Union to 
Interleave</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.expand_views_at_output</p></td>
+<tr 
class="row-even"><td><p>datafusion.optimizer.expand_views_at_output</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, if the returned type is a view type then the output 
will be coerced to a non-view. Coerces <code class="docutils literal 
notranslate"><span class="pre">Utf8View</span></code> to <code class="docutils 
literal notranslate"><span class="pre">LargeUtf8</span></code>, and <code 
class="docutils literal notranslate"><span class="pre">BinaryView</span></code> 
to <code class="docutils literal notranslate"><span 
class="pre">LargeBinary</span></code>.</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.explain.logical_plan_only</p></td>
+<tr class="row-odd"><td><p>datafusion.explain.logical_plan_only</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, the explain statement will only print logical 
plans</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.explain.physical_plan_only</p></td>
+<tr class="row-even"><td><p>datafusion.explain.physical_plan_only</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, the explain statement will only print physical 
plans</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.explain.show_statistics</p></td>
+<tr class="row-odd"><td><p>datafusion.explain.show_statistics</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, the explain statement will print operator statistics 
for physical plans</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.explain.show_sizes</p></td>
+<tr class="row-even"><td><p>datafusion.explain.show_sizes</p></td>
 <td><p>true</p></td>
 <td><p>When set to true, the explain statement will print the partition 
sizes</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.explain.show_schema</p></td>
+<tr class="row-odd"><td><p>datafusion.explain.show_schema</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, the explain statement will print schema 
information</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.explain.format</p></td>
+<tr class="row-even"><td><p>datafusion.explain.format</p></td>
 <td><p>indent</p></td>
 <td><p>Display format of explain. Default is “indent”. When set to “tree”, it 
will print the plan in a tree-rendered format.</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.explain.tree_maximum_render_width</p></td>
+<tr 
class="row-odd"><td><p>datafusion.explain.tree_maximum_render_width</p></td>
 <td><p>240</p></td>
 <td><p>(format=tree only) Maximum total width of the rendered tree. When set 
to 0, the tree will have no width limit.</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.sql_parser.parse_float_as_decimal</p></td>
+<tr 
class="row-even"><td><p>datafusion.sql_parser.parse_float_as_decimal</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, SQL parser will parse float as decimal type</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.sql_parser.enable_ident_normalization</p></td>
+<tr 
class="row-odd"><td><p>datafusion.sql_parser.enable_ident_normalization</p></td>
 <td><p>true</p></td>
 <td><p>When set to true, SQL parser will normalize ident (convert ident to 
lowercase when not quoted)</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.sql_parser.enable_options_value_normalization</p></td>
+<tr 
class="row-even"><td><p>datafusion.sql_parser.enable_options_value_normalization</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, SQL parser will normalize options value (convert 
value to lowercase). Note that this option is ignored and will be removed in 
the future. All case-insensitive values are normalized automatically.</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.sql_parser.dialect</p></td>
+<tr class="row-odd"><td><p>datafusion.sql_parser.dialect</p></td>
 <td><p>generic</p></td>
 <td><p>Configure the SQL dialect used by DataFusion’s parser; supported values 
include: Generic, MySQL, PostgreSQL, Hive, SQLite, Snowflake, Redshift, MsSQL, 
ClickHouse, BigQuery, Ansi, DuckDB and Databricks.</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.sql_parser.support_varchar_with_length</p></td>
+<tr 
class="row-even"><td><p>datafusion.sql_parser.support_varchar_with_length</p></td>
 <td><p>true</p></td>
 <td><p>If true, permit lengths for <code class="docutils literal 
notranslate"><span class="pre">VARCHAR</span></code> such as <code 
class="docutils literal notranslate"><span 
class="pre">VARCHAR(20)</span></code>, but ignore the length. If false, error 
if a <code class="docutils literal notranslate"><span 
class="pre">VARCHAR</span></code> with a length is specified. The Arrow type 
system does not have a notion of maximum string length and thus DataFusion can 
not enforce such limits.</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.sql_parser.map_string_types_to_utf8view</p></td>
+<tr 
class="row-odd"><td><p>datafusion.sql_parser.map_string_types_to_utf8view</p></td>
 <td><p>true</p></td>
 <td><p>If true, string types (VARCHAR, CHAR, Text, and String) are mapped to 
<code class="docutils literal notranslate"><span 
class="pre">Utf8View</span></code> during SQL planning. If false, they are 
mapped to <code class="docutils literal notranslate"><span 
class="pre">Utf8</span></code>. Default is true.</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.sql_parser.collect_spans</p></td>
+<tr class="row-even"><td><p>datafusion.sql_parser.collect_spans</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, the source locations relative to the original SQL 
query (i.e. <a class="reference external" 
href="https://docs.rs/sqlparser/latest/sqlparser/tokenizer/struct.Span.html";><code
 class="docutils literal notranslate"><span class="pre">Span</span></code></a>) 
will be collected and recorded in the logical plan nodes.</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.sql_parser.recursion_limit</p></td>
+<tr class="row-odd"><td><p>datafusion.sql_parser.recursion_limit</p></td>
 <td><p>50</p></td>
 <td><p>Specifies the recursion depth limit when parsing complex SQL 
Queries</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.sql_parser.default_null_ordering</p></td>
+<tr 
class="row-even"><td><p>datafusion.sql_parser.default_null_ordering</p></td>
 <td><p>nulls_max</p></td>
 <td><p>Specifies the default null ordering for query results. There are 4 
options: - <code class="docutils literal notranslate"><span 
class="pre">nulls_max</span></code>: Nulls appear last in ascending order. - 
<code class="docutils literal notranslate"><span 
class="pre">nulls_min</span></code>: Nulls appear first in ascending order. - 
<code class="docutils literal notranslate"><span 
class="pre">nulls_first</span></code>: Nulls always be first in any order. - 
<code class="docutils litera [...]
 </tr>
-<tr class="row-even"><td><p>datafusion.format.safe</p></td>
+<tr class="row-odd"><td><p>datafusion.format.safe</p></td>
 <td><p>true</p></td>
 <td><p>If set to <code class="docutils literal notranslate"><span 
class="pre">true</span></code> any formatting errors will be written to the 
output instead of being converted into a [<code class="docutils literal 
notranslate"><span class="pre">std::fmt::Error</span></code>]</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.format.null</p></td>
+<tr class="row-even"><td><p>datafusion.format.null</p></td>
 <td><p></p></td>
 <td><p>Format string for nulls</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.format.date_format</p></td>
+<tr class="row-odd"><td><p>datafusion.format.date_format</p></td>
 <td><p>%Y-%m-%d</p></td>
 <td><p>Date format for date arrays</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.format.datetime_format</p></td>
+<tr class="row-even"><td><p>datafusion.format.datetime_format</p></td>
 <td><p>%Y-%m-%dT%H:%M:%S%.f</p></td>
 <td><p>Format for DateTime arrays</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.format.timestamp_format</p></td>
+<tr class="row-odd"><td><p>datafusion.format.timestamp_format</p></td>
 <td><p>%Y-%m-%dT%H:%M:%S%.f</p></td>
 <td><p>Timestamp format for timestamp arrays</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.format.timestamp_tz_format</p></td>
+<tr class="row-even"><td><p>datafusion.format.timestamp_tz_format</p></td>
 <td><p>NULL</p></td>
 <td><p>Timestamp format for timestamp with timezone arrays. When <code 
class="docutils literal notranslate"><span class="pre">None</span></code>, ISO 
8601 format is used.</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.format.time_format</p></td>
+<tr class="row-odd"><td><p>datafusion.format.time_format</p></td>
 <td><p>%H:%M:%S%.f</p></td>
 <td><p>Time format for time arrays</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.format.duration_format</p></td>
+<tr class="row-even"><td><p>datafusion.format.duration_format</p></td>
 <td><p>pretty</p></td>
 <td><p>Duration format. Can be either <code class="docutils literal 
notranslate"><span class="pre">&quot;pretty&quot;</span></code> or <code 
class="docutils literal notranslate"><span 
class="pre">&quot;ISO8601&quot;</span></code></p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.format.types_info</p></td>
+<tr class="row-odd"><td><p>datafusion.format.types_info</p></td>
 <td><p>false</p></td>
 <td><p>Show types in visual representation batches</p></td>
 </tr>
diff --git a/user-guide/sql/ddl.html b/user-guide/sql/ddl.html
index 0f6a32df3a..98c3880001 100644
--- a/user-guide/sql/ddl.html
+++ b/user-guide/sql/ddl.html
@@ -840,6 +840,31 @@ table (multiple files with the same schema)</p>
 <span class="k">OPTIONS</span><span class="w"> </span><span 
class="p">(</span><span class="s1">&#39;has_header&#39;</span><span class="w"> 
</span><span class="s1">&#39;true&#39;</span><span class="p">);</span>
 </pre></div>
 </div>
+<p>Tables that are partitioned using a Hive compliant partitioning scheme will 
have their columns and values automatically
+detected and incorporated into the table’s schema and data. Given the 
following example directory structure:</p>
+<div class="highlight-console notranslate"><div 
class="highlight"><pre><span></span><span class="go">hive_partitioned/</span>
+<span class="go">├── a=1</span>
+<span class="go">│   └── b=200</span>
+<span class="go">│       └── file1.parquet</span>
+<span class="go">└── a=2</span>
+<span class="go">    └── b=100</span>
+<span class="go">        └── file2.parquet</span>
+</pre></div>
+</div>
+<p>Users can specify the top level <code class="docutils literal 
notranslate"><span class="pre">hive_partitioned</span></code> directory as an 
<code class="docutils literal notranslate"><span class="pre">EXTERNAL</span> 
<span class="pre">TABLE</span></code> and leverage the Hive partitions to query
+and filter data.</p>
+<div class="highlight-sql notranslate"><div 
class="highlight"><pre><span></span><span class="k">CREATE</span><span 
class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span 
class="k">TABLE</span><span class="w"> </span><span 
class="n">hive_partitioned</span>
+<span class="n">STORED</span><span class="w"> </span><span 
class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span>
+<span class="k">LOCATION</span><span class="w"> </span><span 
class="s1">&#39;/path/to/hive_partitioned/&#39;</span><span class="p">;</span>
+
+<span class="k">SELECT</span><span class="w"> </span><span 
class="k">count</span><span class="p">(</span><span class="o">*</span><span 
class="p">)</span><span class="w"> </span><span class="k">FROM</span><span 
class="w"> </span><span class="n">hive_partitioned</span><span class="w"> 
</span><span class="k">WHERE</span><span class="w"> </span><span 
class="n">b</span><span class="o">=</span><span class="mi">100</span><span 
class="p">;</span>
+<span class="o">+</span><span class="c1">------------------+</span>
+<span class="o">|</span><span class="w"> </span><span 
class="k">count</span><span class="p">(</span><span class="o">*</span><span 
class="p">)</span><span class="w">         </span><span class="o">|</span>
+<span class="o">+</span><span class="c1">------------------+</span>
+<span class="o">|</span><span class="w"> </span><span class="mi">1</span><span 
class="w">                </span><span class="o">|</span>
+<span class="o">+</span><span class="c1">------------------+</span>
+</pre></div>
+</div>
 </section>
 <section id="example-unbounded-data-sources">
 <h3>Example: Unbounded Data Sources<a class="headerlink" 
href="#example-unbounded-data-sources" title="Link to this heading">¶</a></h3>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org

(datafusion) branch asf-site updated: Publish built docs triggered by 26106a2aa8cd49802a0178bc2a9ac16b93524456

Reply via email to