This is an automated email from the ASF dual-hosted git repository.
github-actions[bot] pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 4b11a7e6fa Publish built docs triggered by
84bc8761ac3a126e41658b6cd0ec6bd8cc34cda8
4b11a7e6fa is described below
commit 4b11a7e6fac441b86a492d30c269534bf0d62337
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Fri Jun 5 10:03:52 2026 +0000
Publish built docs triggered by 84bc8761ac3a126e41658b6cd0ec6bd8cc34cda8
---
_sources/user-guide/configs.md.txt | 3 +-
_sources/user-guide/sql/format_options.md.txt | 1 +
searchindex.js | 2 +-
user-guide/configs.html | 218 +++++++++++++-------------
user-guide/sql/format_options.html | 50 +++---
5 files changed, 143 insertions(+), 131 deletions(-)
diff --git a/_sources/user-guide/configs.md.txt
b/_sources/user-guide/configs.md.txt
index 88fbeb3de0..442b72ea9b 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -101,7 +101,8 @@ The following configuration settings are available:
| datafusion.execution.parquet.dictionary_enabled |
true | (writing) Sets if dictionary encoding is enabled.
If NULL, uses default parquet writer setting
[...]
| datafusion.execution.parquet.dictionary_page_size_limit |
1048576 | (writing) Sets best effort maximum dictionary page
size, in bytes
[...]
| datafusion.execution.parquet.statistics_enabled |
page | (writing) Sets if statistics are enabled for any
column Valid values are: "none", "chunk", and "page" These values are not case
sensitive. If NULL, uses default parquet writer setting
[...]
-| datafusion.execution.parquet.max_row_group_size |
1048576 | (writing) Target maximum number of rows in each row
group (defaults to 1M rows). Writing larger row groups requires more memory to
write, but can get better compression and be faster to read.
[...]
+| datafusion.execution.parquet.max_row_group_size |
1048576 | (writing) Target maximum number of rows in each row
group (defaults to 1M rows). Writing larger row groups requires more memory to
write, but can get better compression and be faster to read. When
`max_row_group_bytes` is also set, the writer flushes a row group when either
limit is reached, whichever comes first.
[...]
+| datafusion.execution.parquet.max_row_group_bytes |
NULL | (writing) Target maximum size of each row group in
bytes. When set, the writer flushes whenever either this limit or
`max_row_group_size` is reached, whichever comes first. Useful for bounding
writer memory on wide schemas where a row-count limit can map to very different
byte sizes. Matches the behavior of `parquet.block.size` in parquet-mr. If
`None` (the default), only the row-count [...]
| datafusion.execution.parquet.created_by |
datafusion version 53.1.0 | (writing) Sets "created by" property
[...]
| datafusion.execution.parquet.column_index_truncate_length | 64
| (writing) Sets column index truncate length
[...]
| datafusion.execution.parquet.statistics_truncate_length | 64
| (writing) Sets statistics truncate length. If NULL,
uses default parquet writer setting
[...]
diff --git a/_sources/user-guide/sql/format_options.md.txt
b/_sources/user-guide/sql/format_options.md.txt
index 46d251c18e..ca79858dae 100644
--- a/_sources/user-guide/sql/format_options.md.txt
+++ b/_sources/user-guide/sql/format_options.md.txt
@@ -142,6 +142,7 @@ The following options are available when reading or writing
Parquet files. If an
| BLOOM_FILTER_FPP | Yes | Sets
bloom filter false positive probability (global or per column).
| `'bloom_filter_fpp'` or `'bloom_filter_fpp::col'` | None
|
| BLOOM_FILTER_NDV | Yes | Sets
bloom filter number of distinct values (global or per column).
| `'bloom_filter_ndv'` or `'bloom_filter_ndv::col'` | None
|
| MAX_ROW_GROUP_SIZE | No | Sets
the maximum number of rows per row group. Larger groups require more memory but
can improve compression and scan efficiency.
| `'max_row_group_size'` | 1048576
|
+| MAX_ROW_GROUP_BYTES | No | Sets
the maximum size of each row group in bytes. When both this and
`MAX_ROW_GROUP_SIZE` are set, the row group flushes whenever either limit is
reached. Mirrors `parquet.block.size` from parquet-mr. Currently only honored
when `allow_single_file_parallelism` is `false`; by default the parallel file
writer ignores it. | `'max_row_group_bytes'`
| None |
| ENABLE_PAGE_INDEX | No | If
true, reads the Parquet data page level metadata (the Page Index), if present,
to reduce I/O and decoding.
| `'enable_page_index'` | true
|
| PRUNING | No | If
true, enables row group pruning based on min/max statistics.
| `'pruning'` | true
|
| SKIP_METADATA | No | If
true, skips optional embedded metadata in the file schema.
| `'skip_metadata'` | true
|
diff --git a/searchindex.js b/searchindex.js
index 0ecd29960a..40581e1cbb 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles":{"!=":[[74,"op-neq"]],"!~":[[74,"op-re-not-match"]],"!~*":[[74,"op-re-not-match-i"]],"!~~":[[74,"id19"]],"!~~*":[[74,"id20"]],"#":[[74,"op-bit-xor"]],"%":[[74,"op-modulo"]],"&":[[74,"op-bit-and"]],"(relation,
name) tuples in logical fields and logical columns are
unique":[[15,"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]],"*":[[74,"op-multiply"]],"+":[[74,"op-plus"]],"-":[[74,"op-minus"]],"/":[[74,"op-divide"]],"1.
Array Literal Con [...]
\ No newline at end of file
+Search.setIndex({"alltitles":{"!=":[[74,"op-neq"]],"!~":[[74,"op-re-not-match"]],"!~*":[[74,"op-re-not-match-i"]],"!~~":[[74,"id19"]],"!~~*":[[74,"id20"]],"#":[[74,"op-bit-xor"]],"%":[[74,"op-modulo"]],"&":[[74,"op-bit-and"]],"(relation,
name) tuples in logical fields and logical columns are
unique":[[15,"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]],"*":[[74,"op-multiply"]],"+":[[74,"op-plus"]],"-":[[74,"op-minus"]],"/":[[74,"op-divide"]],"1.
Array Literal Con [...]
\ No newline at end of file
diff --git a/user-guide/configs.html b/user-guide/configs.html
index 12549436d3..9220139b48 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -631,429 +631,433 @@ example, to configure <code class="docutils literal
notranslate"><span class="pr
</tr>
<tr
class="row-even"><td><p>datafusion.execution.parquet.max_row_group_size</p></td>
<td><p>1048576</p></td>
-<td><p>(writing) Target maximum number of rows in each row group (defaults to
1M rows). Writing larger row groups requires more memory to write, but can get
better compression and be faster to read.</p></td>
+<td><p>(writing) Target maximum number of rows in each row group (defaults to
1M rows). Writing larger row groups requires more memory to write, but can get
better compression and be faster to read. When <code class="docutils literal
notranslate"><span class="pre">max_row_group_bytes</span></code> is also set,
the writer flushes a row group when either limit is reached, whichever comes
first.</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.execution.parquet.created_by</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.parquet.max_row_group_bytes</p></td>
+<td><p>NULL</p></td>
+<td><p>(writing) Target maximum size of each row group in bytes. When set, the
writer flushes whenever either this limit or <code class="docutils literal
notranslate"><span class="pre">max_row_group_size</span></code> is reached,
whichever comes first. Useful for bounding writer memory on wide schemas where
a row-count limit can map to very different byte sizes. Matches the behavior of
<code class="docutils literal notranslate"><span
class="pre">parquet.block.size</span></code> in parque [...]
+</tr>
+<tr class="row-even"><td><p>datafusion.execution.parquet.created_by</p></td>
<td><p>datafusion version 53.1.0</p></td>
<td><p>(writing) Sets “created by” property</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.parquet.column_index_truncate_length</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.parquet.column_index_truncate_length</p></td>
<td><p>64</p></td>
<td><p>(writing) Sets column index truncate length</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.parquet.statistics_truncate_length</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.parquet.statistics_truncate_length</p></td>
<td><p>64</p></td>
<td><p>(writing) Sets statistics truncate length. If NULL, uses default
parquet writer setting</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.parquet.data_page_row_count_limit</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.parquet.data_page_row_count_limit</p></td>
<td><p>20000</p></td>
<td><p>(writing) Sets best effort maximum number of rows in data page</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.execution.parquet.encoding</p></td>
+<tr class="row-even"><td><p>datafusion.execution.parquet.encoding</p></td>
<td><p>NULL</p></td>
<td><p>(writing) Sets default encoding for any column. Valid values are:
plain, plain_dictionary, rle, bit_packed, delta_binary_packed,
delta_length_byte_array, delta_byte_array, rle_dictionary, and
byte_stream_split. These values are not case sensitive. If NULL, uses default
parquet writer setting</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.parquet.bloom_filter_on_write</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.parquet.bloom_filter_on_write</p></td>
<td><p>false</p></td>
<td><p>(writing) Write bloom filters for all columns when creating parquet
files</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.parquet.bloom_filter_fpp</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.parquet.bloom_filter_fpp</p></td>
<td><p>NULL</p></td>
<td><p>(writing) Sets bloom filter false positive probability. If NULL, uses
default parquet writer setting</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.parquet.bloom_filter_ndv</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.parquet.bloom_filter_ndv</p></td>
<td><p>NULL</p></td>
<td><p>(writing) Sets bloom filter number of distinct values. If NULL, uses
default parquet writer setting</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.parquet.allow_single_file_parallelism</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.parquet.allow_single_file_parallelism</p></td>
<td><p>true</p></td>
<td><p>(writing) Controls whether DataFusion will attempt to speed up writing
parquet files by serializing them in parallel. Each column in each row group in
each output file are serialized in parallel leveraging a maximum possible core
count of n_files<em>n_row_groups</em>n_columns.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.parquet.maximum_parallel_row_group_writers</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.parquet.maximum_parallel_row_group_writers</p></td>
<td><p>1</p></td>
<td><p>(writing) By default parallel parquet writer is tuned for minimum
memory usage in a streaming execution plan. You may see a performance benefit
when writing large parquet files by increasing
maximum_parallel_row_group_writers and
maximum_buffered_record_batches_per_stream if your system has idle cores and
can tolerate additional memory usage. Boosting these values is likely
worthwhile when writing out already in-memory data, such as from a cached data
frame.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.parquet.maximum_buffered_record_batches_per_stream</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.parquet.maximum_buffered_record_batches_per_stream</p></td>
<td><p>2</p></td>
<td><p>(writing) By default parallel parquet writer is tuned for minimum
memory usage in a streaming execution plan. You may see a performance benefit
when writing large parquet files by increasing
maximum_parallel_row_group_writers and
maximum_buffered_record_batches_per_stream if your system has idle cores and
can tolerate additional memory usage. Boosting these values is likely
worthwhile when writing out already in-memory data, such as from a cached data
frame.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.parquet.content_defined_chunking.enabled</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.parquet.content_defined_chunking.enabled</p></td>
<td><p>false</p></td>
<td><p>(writing) EXPERIMENTAL: Enable content-defined chunking (CDC) when
writing parquet files. When enabled, parallel writing is automatically disabled
since the chunker state must persist across row groups.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.parquet.content_defined_chunking.min_chunk_size</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.parquet.content_defined_chunking.min_chunk_size</p></td>
<td><p>262144</p></td>
<td><p>Minimum chunk size in bytes. The rolling hash will not trigger a split
until this many bytes have been accumulated. Default is 256 KiB.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.parquet.content_defined_chunking.max_chunk_size</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.parquet.content_defined_chunking.max_chunk_size</p></td>
<td><p>1048576</p></td>
<td><p>Maximum chunk size in bytes. A split is forced when the accumulated
size exceeds this value. Default is 1 MiB.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.parquet.content_defined_chunking.norm_level</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.parquet.content_defined_chunking.norm_level</p></td>
<td><p>0</p></td>
<td><p>Normalization level. Increasing this improves deduplication ratio but
increases fragmentation. Recommended range is [-3, 3], default is 0.</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.execution.planning_concurrency</p></td>
+<tr class="row-odd"><td><p>datafusion.execution.planning_concurrency</p></td>
<td><p>0</p></td>
<td><p>Fan-out during initial physical planning. This is mostly use to plan
<code class="docutils literal notranslate"><span
class="pre">UNION</span></code> children in parallel. Defaults to the number of
CPU cores on the system</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.skip_physical_aggregate_schema_check</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.skip_physical_aggregate_schema_check</p></td>
<td><p>false</p></td>
<td><p>When set to true, skips verifying that the schema produced by planning
the input of <code class="docutils literal notranslate"><span
class="pre">LogicalPlan::Aggregate</span></code> exactly matches the schema of
the input plan. When set to false, if the schema does not match exactly
(including nullability and metadata), a planning error will be raised. This is
used to workaround bugs in the planner that are now caught by the new schema
verification step.</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.execution.spill_compression</p></td>
+<tr class="row-odd"><td><p>datafusion.execution.spill_compression</p></td>
<td><p>uncompressed</p></td>
<td><p>Sets the compression codec used when spilling data to disk. Since
datafusion writes spill files using the Arrow IPC Stream format, only codecs
supported by the Arrow IPC Stream Writer are allowed. Valid values are:
uncompressed, lz4_frame, zstd. Note: lz4_frame offers faster (de)compression,
but typically results in larger spill files. In contrast, zstd achieves higher
compression ratios at the cost of slower (de)compression speed.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.sort_spill_reservation_bytes</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.sort_spill_reservation_bytes</p></td>
<td><p>10485760</p></td>
<td><p>Specifies the reserved memory for each spillable sort operation to
facilitate an in-memory merge. When a sort operation spills to disk, the
in-memory data must be sorted and merged before being written to a file. This
setting reserves a specific amount of memory for that in-memory sort/merge
process. Note: This setting is irrelevant if the sort operation cannot spill
(i.e., if there’s no <code class="docutils literal notranslate"><span
class="pre">DiskManager</span></code> configu [...]
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.sort_in_place_threshold_bytes</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.sort_in_place_threshold_bytes</p></td>
<td><p>1048576</p></td>
<td><p>When sorting, below what size should data be concatenated and sorted in
a single RecordBatch rather than sorted in batches and merged.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.sort_pushdown_buffer_capacity</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.sort_pushdown_buffer_capacity</p></td>
<td><p>1073741824</p></td>
<td><p>Maximum buffer capacity (in bytes) per partition for BufferExec
inserted during sort pushdown optimization. When PushdownSort eliminates a
SortExec under SortPreservingMergeExec, a BufferExec is inserted to replace
SortExec’s buffering role. This prevents I/O stalls by allowing the scan to run
ahead of the merge. This uses strictly less memory than the SortExec it
replaces (which buffers the entire partition). The buffer respects the global
memory pool limit. Setting this to a lar [...]
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.max_spill_file_size_bytes</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.max_spill_file_size_bytes</p></td>
<td><p>134217728</p></td>
<td><p>Maximum size in bytes for individual spill files before rotating to a
new file. When operators spill data to disk (e.g., RepartitionExec), they write
multiple batches to the same file until this size limit is reached, then rotate
to a new file. This reduces syscall overhead compared to one-file-per-batch
while preventing files from growing too large. A larger value reduces file
creation overhead but may hold more disk space. A smaller value creates more
files but allows finer-grai [...]
</tr>
-<tr class="row-odd"><td><p>datafusion.execution.meta_fetch_concurrency</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.meta_fetch_concurrency</p></td>
<td><p>32</p></td>
<td><p>Number of files to read in parallel when inferring schema and
statistics</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.minimum_parallel_output_files</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.minimum_parallel_output_files</p></td>
<td><p>4</p></td>
<td><p>Guarantees a minimum level of output files running in parallel.
RecordBatches will be distributed in round robin fashion to each parallel
writer. Each writer is closed and a new file opened once
soft_max_rows_per_output_file is reached.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.soft_max_rows_per_output_file</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.soft_max_rows_per_output_file</p></td>
<td><p>50000000</p></td>
<td><p>Target number of rows in output files when writing multiple. This is a
soft max, so it can be exceeded slightly. There also will be one file smaller
than the limit if the total number of rows written is not roughly divisible by
the soft max</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.max_buffered_batches_per_output_file</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.max_buffered_batches_per_output_file</p></td>
<td><p>2</p></td>
<td><p>This is the maximum number of RecordBatches buffered for each output
file being worked. Higher values can potentially give faster write performance
at the cost of higher peak memory consumption</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.listing_table_ignore_subdirectory</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.listing_table_ignore_subdirectory</p></td>
<td><p>true</p></td>
<td><p>Should sub directories be ignored when scanning directories for data
files. Defaults to true (ignores subdirectories), consistent with Hive. Note
that this setting does not affect reading partitioned tables (e.g. <code
class="docutils literal notranslate"><span
class="pre">/table/year=2021/month=01/data.parquet</span></code>).</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.listing_table_factory_infer_partitions</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.listing_table_factory_infer_partitions</p></td>
<td><p>true</p></td>
<td><p>Should a <code class="docutils literal notranslate"><span
class="pre">ListingTable</span></code> created through the <code
class="docutils literal notranslate"><span
class="pre">ListingTableFactory</span></code> infer table partitions from Hive
compliant directories. Defaults to true (partition columns are inferred and
will be represented in the table schema).</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.execution.enable_recursive_ctes</p></td>
+<tr class="row-even"><td><p>datafusion.execution.enable_recursive_ctes</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion support recursive CTEs</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.split_file_groups_by_statistics</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.split_file_groups_by_statistics</p></td>
<td><p>false</p></td>
<td><p>Attempt to eliminate sorts by packing & sorting files with
non-overlapping statistics into the same file groups. Currently
experimental</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.keep_partition_by_columns</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.keep_partition_by_columns</p></td>
<td><p>false</p></td>
<td><p>Should DataFusion keep the columns used for partition_by in the output
RecordBatches</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.skip_partial_aggregation_probe_ratio_threshold</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.skip_partial_aggregation_probe_ratio_threshold</p></td>
<td><p>0.8</p></td>
<td><p>Aggregation ratio (number of distinct groups / number of input rows)
threshold for skipping partial aggregation. If the value is greater then
partial aggregation will skip aggregation for further input</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.skip_partial_aggregation_probe_rows_threshold</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.skip_partial_aggregation_probe_rows_threshold</p></td>
<td><p>100000</p></td>
<td><p>Number of input rows partial aggregation partition should process,
before aggregation ratio check and trying to switch to skipping aggregation
mode</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.use_row_number_estimates_to_optimize_partitioning</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.use_row_number_estimates_to_optimize_partitioning</p></td>
<td><p>false</p></td>
<td><p>Should DataFusion use row number estimates at the input to decide
whether increasing parallelism is beneficial or not. By default, only exact row
numbers (not estimates) are used for this decision. Setting this flag to <code
class="docutils literal notranslate"><span class="pre">true</span></code> will
likely produce better plans. if the source of statistics is accurate. We plan
to make this the default in the future.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.enforce_batch_size_in_joins</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.enforce_batch_size_in_joins</p></td>
<td><p>false</p></td>
<td><p>Should DataFusion enforce batch size in joins or not. By default,
DataFusion will not enforce batch size in joins. Enforcing batch size in joins
can reduce memory usage when joining large tables with a highly-selective join
filter, but is also slightly slower.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.objectstore_writer_buffer_size</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.objectstore_writer_buffer_size</p></td>
<td><p>10485760</p></td>
<td><p>Size (bytes) of data buffer DataFusion uses when writing output files.
This affects the size of the data chunks that are uploaded to remote object
stores (e.g. AWS S3). If very large (>= 100 GiB) output files are being
written, it may be necessary to increase this size to avoid errors from the
remote end point.</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.execution.enable_ansi_mode</p></td>
+<tr class="row-even"><td><p>datafusion.execution.enable_ansi_mode</p></td>
<td><p>false</p></td>
<td><p>Whether to enable ANSI SQL mode. The flag is experimental and relevant
only for DataFusion Spark built-in functions When <code class="docutils literal
notranslate"><span class="pre">enable_ansi_mode</span></code> is set to <code
class="docutils literal notranslate"><span class="pre">true</span></code>, the
query engine follows ANSI SQL semantics for expressions, casting, and error
handling. This means: - <strong>Strict type coercion rules:</strong> implicit
casts between incompati [...]
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.hash_join_buffering_capacity</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.hash_join_buffering_capacity</p></td>
<td><p>0</p></td>
<td><p>How many bytes to buffer in the probe side of hash joins while the
build side is concurrently being built. Without this, hash joins will wait
until the full materialization of the build side before polling the probe side.
This is useful in scenarios where the query is not completely CPU bounded,
allowing to do some early work concurrently and reducing the latency of the
query. Note that when hash join buffering is enabled, the probe side will start
eagerly polling data, not giving [...]
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.enable_distinct_aggregation_soft_limit</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.enable_distinct_aggregation_soft_limit</p></td>
<td><p>true</p></td>
<td><p>When set to true, the optimizer will push a limit operation into
grouped aggregations which have no aggregate expressions, as a soft limit,
emitting groups once the limit is reached, before all rows in the group are
read.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.enable_round_robin_repartition</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.enable_round_robin_repartition</p></td>
<td><p>true</p></td>
<td><p>When set to true, the physical plan optimizer will try to add round
robin repartitioning to increase parallelism to leverage more CPU cores</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.enable_topk_aggregation</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.enable_topk_aggregation</p></td>
<td><p>true</p></td>
<td><p>When set to true, the optimizer will attempt to perform limit
operations during aggregations, if possible</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.optimizer.enable_window_limits</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.enable_window_limits</p></td>
<td><p>true</p></td>
<td><p>When set to true, the optimizer will attempt to push limit operations
past window functions, if possible</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.enable_window_topn</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.enable_window_topn</p></td>
<td><p>false</p></td>
<td><p>When set to true, the optimizer will replace Filter(rn<=K) →
Window(ROW_NUMBER) → Sort patterns with a PartitionedTopKExec that maintains
per-partition heaps, avoiding a full sort of the input. When the window
partition key has low cardinality, enabling this optimization can improve
performance. However, for high cardinality keys, it may cause regressions in
both memory usage and runtime.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.enable_topk_repartition</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.enable_topk_repartition</p></td>
<td><p>true</p></td>
<td><p>When set to true, the optimizer will push TopK (Sort with fetch) below
hash repartition when the partition key is a prefix of the sort key, reducing
data volume before the shuffle.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.enable_topk_dynamic_filter_pushdown</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.enable_topk_dynamic_filter_pushdown</p></td>
<td><p>true</p></td>
<td><p>When set to true, the optimizer will attempt to push down TopK dynamic
filters into the file scan phase.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.enable_physical_uncorrelated_scalar_subquery</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.enable_physical_uncorrelated_scalar_subquery</p></td>
<td><p>true</p></td>
<td><p>When set to true, uncorrelated scalar subqueries are left in the
logical plan and executed by <code class="docutils literal notranslate"><span
class="pre">ScalarSubqueryExec</span></code> during physical execution. When
set to false, all scalar subqueries (including uncorrelated ones) are rewritten
to left joins by the <code class="docutils literal notranslate"><span
class="pre">ScalarSubqueryToJoin</span></code> optimizer rule. Note disabling
this option is not recommended. It re [...]
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.enable_join_dynamic_filter_pushdown</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.enable_join_dynamic_filter_pushdown</p></td>
<td><p>true</p></td>
<td><p>When set to true, the optimizer will attempt to push down Join dynamic
filters into the file scan phase.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.enable_aggregate_dynamic_filter_pushdown</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.enable_aggregate_dynamic_filter_pushdown</p></td>
<td><p>true</p></td>
<td><p>When set to true, the optimizer will attempt to push down Aggregate
dynamic filters into the file scan phase.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.enable_dynamic_filter_pushdown</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.enable_dynamic_filter_pushdown</p></td>
<td><p>true</p></td>
<td><p>When set to true attempts to push down dynamic filters generated by
operators (TopK, Join & Aggregate) into the file scan phase. For example,
for a query such as <code class="docutils literal notranslate"><span
class="pre">SELECT</span> <span class="pre">*</span> <span
class="pre">FROM</span> <span class="pre">t</span> <span
class="pre">ORDER</span> <span class="pre">BY</span> <span
class="pre">timestamp</span> <span class="pre">DESC</span> <span
class="pre">LIMIT</span> <span [...]
</tr>
-<tr class="row-even"><td><p>datafusion.optimizer.filter_null_join_keys</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.filter_null_join_keys</p></td>
<td><p>false</p></td>
<td><p>When set to true, the optimizer will insert filters before a join
between a nullable and non-nullable column to filter out nulls on the nullable
side. This filter can add additional overhead when the file format does not
fully support predicate push down.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.repartition_aggregations</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.repartition_aggregations</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion repartition data using the aggregate keys to execute
aggregates in parallel using the provided <code class="docutils literal
notranslate"><span class="pre">target_partitions</span></code> level</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.repartition_file_min_size</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.repartition_file_min_size</p></td>
<td><p>1048576</p></td>
<td><p>Minimum total file size in bytes for file-group byte-range splitting to
fire. Files (or merged file groups) smaller than this stay as one partition.
Lower values produce more, smaller partitions — better at filling <code
class="docutils literal notranslate"><span
class="pre">target_partitions</span></code> worth of cores when files are
modestly sized, at the cost of slightly more per-partition open / metadata-load
overhead.</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.repartition_joins</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.repartition_joins</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion repartition data using the join keys to execute joins
in parallel using the provided <code class="docutils literal notranslate"><span
class="pre">target_partitions</span></code> level</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.allow_symmetric_joins_without_pruning</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.allow_symmetric_joins_without_pruning</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion allow symmetric hash joins for unbounded data sources
even when its inputs do not have any ordering or filtering If the flag is not
enabled, the SymmetricHashJoin operator will be unable to prune its internal
buffers, resulting in certain join types - such as Full, Left, LeftAnti,
LeftSemi, Right, RightAnti, and RightSemi - being produced only at the end of
the execution. This is not typical in stream processing. Additionally, without
proper design for long runne [...]
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.repartition_file_scans</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.repartition_file_scans</p></td>
<td><p>true</p></td>
<td><p>When set to <code class="docutils literal notranslate"><span
class="pre">true</span></code>, datasource partitions will be repartitioned to
achieve maximum parallelism. This applies to both in-memory partitions and
FileSource’s file groups (1 group is 1 partition). For FileSources, only
Parquet and CSV formats are currently supported. If set to <code
class="docutils literal notranslate"><span class="pre">true</span></code> for a
FileSource, all files will be repartitioned evenly ( [...]
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.preserve_file_partitions</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.preserve_file_partitions</p></td>
<td><p>0</p></td>
<td><p>Minimum number of distinct partition values required to group files by
their Hive partition column values (enabling Hash partitioning declaration).
How the option is used: - preserve_file_partitions=0: Disable it. -
preserve_file_partitions=1: Always enable it. - preserve_file_partitions=N,
actual file partitions=M: Only enable when M >= N. This threshold preserves
I/O parallelism when file partitioning is below it. Note: This may reduce
parallelism, rooting from the I/O level, [...]
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.repartition_windows</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.repartition_windows</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion repartition data using the partitions keys to execute
window functions in parallel using the provided <code class="docutils literal
notranslate"><span class="pre">target_partitions</span></code> level</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.optimizer.repartition_sorts</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.repartition_sorts</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion execute sorts in a per-partition fashion and merge
afterwards instead of coalescing first and sorting globally. With this flag is
enabled, plans in the form below <code class="docutils literal
notranslate"><span class="pre">text</span> <span
class="pre">"SortExec:</span> <span class="pre">[a@0</span> <span
class="pre">ASC]",</span> <span class="pre">"</span> <span
class="pre">CoalescePartitionsExec",</span> <span
class="pre">"</span> [...]
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.subset_repartition_threshold</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.subset_repartition_threshold</p></td>
<td><p>4</p></td>
<td><p>Partition count threshold for subset satisfaction optimization. When
the current partition count is >= this threshold, DataFusion will skip
repartitioning if the required partitioning expression is a subset of the
current partition expression such as Hash(a) satisfies Hash(a, b). When the
current partition count is < this threshold, DataFusion will repartition to
increase parallelism even when subset satisfaction applies. Set to 0 to always
repartition (disable subset satisf [...]
</tr>
-<tr class="row-even"><td><p>datafusion.optimizer.prefer_existing_sort</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.prefer_existing_sort</p></td>
<td><p>false</p></td>
<td><p>When true, DataFusion will opportunistically remove sorts when the data
is already sorted, (i.e. setting <code class="docutils literal
notranslate"><span class="pre">preserve_order</span></code> to true on <code
class="docutils literal notranslate"><span
class="pre">RepartitionExec</span></code> and using <code class="docutils
literal notranslate"><span class="pre">SortPreservingMergeExec</span></code>)
When false, DataFusion will maximize plan parallelism using <code class="docut
[...]
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.skip_failed_rules</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.skip_failed_rules</p></td>
<td><p>false</p></td>
<td><p>When set to true, the logical plan optimizer will produce warning
messages if any optimization rules produce errors and then proceed to the next
rule. When set to false, any rules that produce errors will cause the query to
fail</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.optimizer.max_passes</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.max_passes</p></td>
<td><p>3</p></td>
<td><p>Number of times that the optimizer will attempt to optimize the
plan</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.top_down_join_key_reordering</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.top_down_join_key_reordering</p></td>
<td><p>true</p></td>
<td><p>When set to true, the physical plan optimizer will run a top down
process to reorder the join keys</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.optimizer.join_reordering</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.join_reordering</p></td>
<td><p>true</p></td>
<td><p>When set to true, the physical plan optimizer may swap join inputs
based on statistics. When set to false, statistics-driven join input reordering
is disabled and the original join order in the query is used.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.use_statistics_registry</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.use_statistics_registry</p></td>
<td><p>false</p></td>
<td><p>When set to true, the physical plan optimizer uses the pluggable <code
class="docutils literal notranslate"><span
class="pre">StatisticsRegistry</span></code> for statistics propagation across
operators. This enables more accurate cardinality estimates compared to each
operator’s built-in <code class="docutils literal notranslate"><span
class="pre">partition_statistics</span></code>.</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.optimizer.prefer_hash_join</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.prefer_hash_join</p></td>
<td><p>true</p></td>
<td><p>When set to true, the physical plan optimizer will prefer HashJoin over
SortMergeJoin. HashJoin can work more efficiently than SortMergeJoin but
consumes more memory</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.enable_piecewise_merge_join</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.enable_piecewise_merge_join</p></td>
<td><p>false</p></td>
<td><p>When set to true, piecewise merge join is enabled. PiecewiseMergeJoin
is currently experimental. Physical planner will opt for PiecewiseMergeJoin
when there is only one range filter.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.hash_join_single_partition_threshold</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.hash_join_single_partition_threshold</p></td>
<td><p>1048576</p></td>
<td><p>The maximum estimated size in bytes for one input side of a HashJoin
will be collected into a single partition</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.hash_join_single_partition_threshold_rows</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.hash_join_single_partition_threshold_rows</p></td>
<td><p>131072</p></td>
<td><p>The maximum estimated size in rows for one input side of a HashJoin
will be collected into a single partition</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.hash_join_inlist_pushdown_max_size</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.hash_join_inlist_pushdown_max_size</p></td>
<td><p>131072</p></td>
<td><p>Maximum size in bytes for the build side of a hash join to be pushed
down as an InList expression for dynamic filtering. Build sides larger than
this will use hash table lookups instead. Set to 0 to always use hash table
lookups. InList pushdown can be more efficient for small build sides because it
can result in better statistics pruning as well as use any bloom filters
present on the scan side. InList expressions are also more transparent and
easier to serialize over the network [...]
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.hash_join_inlist_pushdown_max_distinct_values</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.hash_join_inlist_pushdown_max_distinct_values</p></td>
<td><p>150</p></td>
<td><p>Maximum number of distinct values (rows) in the build side of a hash
join to be pushed down as an InList expression for dynamic filtering. Build
sides with more rows than this will use hash table lookups instead. Set to 0 to
always use hash table lookups. This provides an additional limit beyond <code
class="docutils literal notranslate"><span
class="pre">hash_join_inlist_pushdown_max_size</span></code> to prevent very
large IN lists that might not provide much benefit over hash t [...]
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.default_filter_selectivity</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.default_filter_selectivity</p></td>
<td><p>20</p></td>
<td><p>The default filter selectivity used by Filter Statistics when an exact
selectivity cannot be determined. Valid values are between 0 (no selectivity)
and 100 (all rows are selected).</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.prefer_existing_union</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.prefer_existing_union</p></td>
<td><p>false</p></td>
<td><p>When set to true, the optimizer will not attempt to convert Union to
Interleave</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.expand_views_at_output</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.expand_views_at_output</p></td>
<td><p>false</p></td>
<td><p>When set to true, if the returned type is a view type then the output
will be coerced to a non-view. Coerces <code class="docutils literal
notranslate"><span class="pre">Utf8View</span></code> to <code class="docutils
literal notranslate"><span class="pre">LargeUtf8</span></code>, and <code
class="docutils literal notranslate"><span class="pre">BinaryView</span></code>
to <code class="docutils literal notranslate"><span
class="pre">LargeBinary</span></code>.</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.enable_sort_pushdown</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.enable_sort_pushdown</p></td>
<td><p>true</p></td>
<td><p>Enable sort pushdown optimization. When enabled, attempts to push sort
requirements down to data sources that can natively handle them (e.g., by
reversing file/row group read order). Returns <strong>inexact
ordering</strong>: Sort operator is kept for correctness, but optimized input
enables early termination for TopK queries (ORDER BY … LIMIT N), providing
significant speedup. Memory: No additional overhead (only changes read order).
Future: Will add option to detect perfectly so [...]
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.enable_leaf_expression_pushdown</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.enable_leaf_expression_pushdown</p></td>
<td><p>true</p></td>
<td><p>When set to true, the optimizer will extract leaf expressions (such as
<code class="docutils literal notranslate"><span
class="pre">get_field</span></code>) from filter/sort/join nodes into
projections closer to the leaf table scans, and push those projections down
towards the leaf nodes.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.enable_unions_to_filter</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.enable_unions_to_filter</p></td>
<td><p>false</p></td>
<td><p>When set to true, the logical optimizer will rewrite <code
class="docutils literal notranslate"><span class="pre">UNION</span> <span
class="pre">DISTINCT</span></code> branches that read from the same source and
differ only by filter predicates into a single branch with a combined filter.
This optimization is conservative and only applies when the branches share the
same source and compatible wrapper nodes such as identical projections or
aliases.</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.explain.logical_plan_only</p></td>
+<tr class="row-odd"><td><p>datafusion.explain.logical_plan_only</p></td>
<td><p>false</p></td>
<td><p>When set to true, the explain statement will only print logical
plans</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.explain.physical_plan_only</p></td>
+<tr class="row-even"><td><p>datafusion.explain.physical_plan_only</p></td>
<td><p>false</p></td>
<td><p>When set to true, the explain statement will only print physical
plans</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.explain.show_statistics</p></td>
+<tr class="row-odd"><td><p>datafusion.explain.show_statistics</p></td>
<td><p>false</p></td>
<td><p>When set to true, the explain statement will print operator statistics
for physical plans</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.explain.show_sizes</p></td>
+<tr class="row-even"><td><p>datafusion.explain.show_sizes</p></td>
<td><p>true</p></td>
<td><p>When set to true, the explain statement will print the partition
sizes</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.explain.show_schema</p></td>
+<tr class="row-odd"><td><p>datafusion.explain.show_schema</p></td>
<td><p>false</p></td>
<td><p>When set to true, the explain statement will print schema
information</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.explain.format</p></td>
+<tr class="row-even"><td><p>datafusion.explain.format</p></td>
<td><p>indent</p></td>
<td><p>Display format of explain. Default is “indent”. When set to “tree”, it
will print the plan in a tree-rendered format.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.explain.tree_maximum_render_width</p></td>
+<tr
class="row-odd"><td><p>datafusion.explain.tree_maximum_render_width</p></td>
<td><p>240</p></td>
<td><p>(format=tree only) Maximum total width of the rendered tree. When set
to 0, the tree will have no width limit.</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.explain.analyze_level</p></td>
+<tr class="row-even"><td><p>datafusion.explain.analyze_level</p></td>
<td><p>dev</p></td>
<td><p>Verbosity level for “EXPLAIN ANALYZE”. Default is “dev” “summary” shows
common metrics for high-level insights. “dev” provides deep operator-level
introspection for developers.</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.explain.analyze_categories</p></td>
+<tr class="row-odd"><td><p>datafusion.explain.analyze_categories</p></td>
<td><p>all</p></td>
<td><p>Which metric categories to include in “EXPLAIN ANALYZE” output.
Comma-separated list of: “rows”, “bytes”, “timing”, “uncategorized”. Use “none”
to show plan structure only, or “all” (default) to show everything. Metrics
without a declared category are treated as “uncategorized”.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.sql_parser.parse_float_as_decimal</p></td>
+<tr
class="row-even"><td><p>datafusion.sql_parser.parse_float_as_decimal</p></td>
<td><p>false</p></td>
<td><p>When set to true, SQL parser will parse float as decimal type</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.sql_parser.enable_ident_normalization</p></td>
+<tr
class="row-odd"><td><p>datafusion.sql_parser.enable_ident_normalization</p></td>
<td><p>true</p></td>
<td><p>When set to true, SQL parser will normalize ident (convert ident to
lowercase when not quoted)</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.sql_parser.enable_options_value_normalization</p></td>
+<tr
class="row-even"><td><p>datafusion.sql_parser.enable_options_value_normalization</p></td>
<td><p>false</p></td>
<td><p>When set to true, SQL parser will normalize options value (convert
value to lowercase). Note that this option is ignored and will be removed in
the future. All case-insensitive values are normalized automatically.</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.sql_parser.dialect</p></td>
+<tr class="row-odd"><td><p>datafusion.sql_parser.dialect</p></td>
<td><p>generic</p></td>
<td><p>Configure the SQL dialect used by DataFusion’s parser; supported values
include: Generic, MySQL, PostgreSQL, Hive, SQLite, Snowflake, Redshift, MsSQL,
ClickHouse, BigQuery, Ansi, DuckDB and Databricks.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.sql_parser.support_varchar_with_length</p></td>
+<tr
class="row-even"><td><p>datafusion.sql_parser.support_varchar_with_length</p></td>
<td><p>true</p></td>
<td><p>If true, permit lengths for <code class="docutils literal
notranslate"><span class="pre">VARCHAR</span></code> such as <code
class="docutils literal notranslate"><span
class="pre">VARCHAR(20)</span></code>, but ignore the length. If false, error
if a <code class="docutils literal notranslate"><span
class="pre">VARCHAR</span></code> with a length is specified. The Arrow type
system does not have a notion of maximum string length and thus DataFusion can
not enforce such limits.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.sql_parser.map_string_types_to_utf8view</p></td>
+<tr
class="row-odd"><td><p>datafusion.sql_parser.map_string_types_to_utf8view</p></td>
<td><p>true</p></td>
<td><p>If true, string types (VARCHAR, CHAR, Text, and String) are mapped to
<code class="docutils literal notranslate"><span
class="pre">Utf8View</span></code> during SQL planning. If false, they are
mapped to <code class="docutils literal notranslate"><span
class="pre">Utf8</span></code>. Default is true.</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.sql_parser.collect_spans</p></td>
+<tr class="row-even"><td><p>datafusion.sql_parser.collect_spans</p></td>
<td><p>false</p></td>
<td><p>When set to true, the source locations relative to the original SQL
query (i.e. <a class="reference external"
href="https://docs.rs/sqlparser/latest/sqlparser/tokenizer/struct.Span.html"><code
class="docutils literal notranslate"><span class="pre">Span</span></code></a>)
will be collected and recorded in the logical plan nodes.</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.sql_parser.recursion_limit</p></td>
+<tr class="row-odd"><td><p>datafusion.sql_parser.recursion_limit</p></td>
<td><p>50</p></td>
<td><p>Specifies the recursion depth limit when parsing complex SQL
Queries</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.sql_parser.default_null_ordering</p></td>
+<tr
class="row-even"><td><p>datafusion.sql_parser.default_null_ordering</p></td>
<td><p>nulls_max</p></td>
<td><p>Specifies the default null ordering for query results. There are 4
options: - <code class="docutils literal notranslate"><span
class="pre">nulls_max</span></code>: Nulls appear last in ascending order. -
<code class="docutils literal notranslate"><span
class="pre">nulls_min</span></code>: Nulls appear first in ascending order. -
<code class="docutils literal notranslate"><span
class="pre">nulls_first</span></code>: Nulls always be first in any order. -
<code class="docutils litera [...]
</tr>
-<tr
class="row-even"><td><p>datafusion.sql_parser.enable_subquery_sort_elimination</p></td>
+<tr
class="row-odd"><td><p>datafusion.sql_parser.enable_subquery_sort_elimination</p></td>
<td><p>true</p></td>
<td><p>When set to true, DataFusion may remove <code class="docutils literal
notranslate"><span class="pre">ORDER</span> <span class="pre">BY</span></code>
clauses from subqueries or CTEs during SQL planning when their ordering cannot
affect the result, such as when no <code class="docutils literal
notranslate"><span class="pre">LIMIT</span></code> or other order-sensitive
operator depends on them. Disable this option to preserve explicit subquery
ordering in the planned query.</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.format.safe</p></td>
+<tr class="row-even"><td><p>datafusion.format.safe</p></td>
<td><p>true</p></td>
<td><p>If set to <code class="docutils literal notranslate"><span
class="pre">true</span></code> any formatting errors will be written to the
output instead of being converted into a [<code class="docutils literal
notranslate"><span class="pre">std::fmt::Error</span></code>]</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.format.null</p></td>
+<tr class="row-odd"><td><p>datafusion.format.null</p></td>
<td><p></p></td>
<td><p>Format string for nulls</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.format.date_format</p></td>
+<tr class="row-even"><td><p>datafusion.format.date_format</p></td>
<td><p>%Y-%m-%d</p></td>
<td><p>Date format for date arrays</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.format.datetime_format</p></td>
+<tr class="row-odd"><td><p>datafusion.format.datetime_format</p></td>
<td><p>%Y-%m-%dT%H:%M:%S%.f</p></td>
<td><p>Format for DateTime arrays</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.format.timestamp_format</p></td>
+<tr class="row-even"><td><p>datafusion.format.timestamp_format</p></td>
<td><p>%Y-%m-%dT%H:%M:%S%.f</p></td>
<td><p>Timestamp format for timestamp arrays</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.format.timestamp_tz_format</p></td>
+<tr class="row-odd"><td><p>datafusion.format.timestamp_tz_format</p></td>
<td><p>NULL</p></td>
<td><p>Timestamp format for timestamp with timezone arrays. When <code
class="docutils literal notranslate"><span class="pre">None</span></code>, ISO
8601 format is used.</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.format.time_format</p></td>
+<tr class="row-even"><td><p>datafusion.format.time_format</p></td>
<td><p>%H:%M:%S%.f</p></td>
<td><p>Time format for time arrays</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.format.duration_format</p></td>
+<tr class="row-odd"><td><p>datafusion.format.duration_format</p></td>
<td><p>pretty</p></td>
<td><p>Duration format. Can be either <code class="docutils literal
notranslate"><span class="pre">"pretty"</span></code> or <code
class="docutils literal notranslate"><span
class="pre">"ISO8601"</span></code></p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.format.types_info</p></td>
+<tr class="row-even"><td><p>datafusion.format.types_info</p></td>
<td><p>false</p></td>
<td><p>Show types in visual representation batches</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.spark.map_key_dedup_policy</p></td>
+<tr class="row-odd"><td><p>datafusion.spark.map_key_dedup_policy</p></td>
<td><p>EXCEPTION</p></td>
<td><p>Policy for handling duplicate keys in Spark-compatible map-construction
functions (<code class="docutils literal notranslate"><span
class="pre">map_from_arrays</span></code>, <code class="docutils literal
notranslate"><span class="pre">map_from_entries</span></code>, <code
class="docutils literal notranslate"><span
class="pre">str_to_map</span></code>). Mirrors Spark’s <a class="reference
external"
href="https://github.com/apache/spark/blob/cf3a34e19dfcf70e2d679217ff1ba21302212472
[...]
</tr>
diff --git a/user-guide/sql/format_options.html
b/user-guide/sql/format_options.html
index 8f07a8ed0c..7c82a684ad 100644
--- a/user-guide/sql/format_options.html
+++ b/user-guide/sql/format_options.html
@@ -672,133 +672,139 @@ a<span class="p">;</span>b
<td><p><code class="docutils literal notranslate"><span
class="pre">'max_row_group_size'</span></code></p></td>
<td><p>1048576</p></td>
</tr>
-<tr class="row-even"><td><p>ENABLE_PAGE_INDEX</p></td>
+<tr class="row-even"><td><p>MAX_ROW_GROUP_BYTES</p></td>
+<td><p>No</p></td>
+<td><p>Sets the maximum size of each row group in bytes. When both this and
<code class="docutils literal notranslate"><span
class="pre">MAX_ROW_GROUP_SIZE</span></code> are set, the row group flushes
whenever either limit is reached. Mirrors <code class="docutils literal
notranslate"><span class="pre">parquet.block.size</span></code> from
parquet-mr. Currently only honored when <code class="docutils literal
notranslate"><span class="pre">allow_single_file_parallelism</span></code> is
<c [...]
+<td><p><code class="docutils literal notranslate"><span
class="pre">'max_row_group_bytes'</span></code></p></td>
+<td><p>None</p></td>
+</tr>
+<tr class="row-odd"><td><p>ENABLE_PAGE_INDEX</p></td>
<td><p>No</p></td>
<td><p>If true, reads the Parquet data page level metadata (the Page Index),
if present, to reduce I/O and decoding.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'enable_page_index'</span></code></p></td>
<td><p>true</p></td>
</tr>
-<tr class="row-odd"><td><p>PRUNING</p></td>
+<tr class="row-even"><td><p>PRUNING</p></td>
<td><p>No</p></td>
<td><p>If true, enables row group pruning based on min/max statistics.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'pruning'</span></code></p></td>
<td><p>true</p></td>
</tr>
-<tr class="row-even"><td><p>SKIP_METADATA</p></td>
+<tr class="row-odd"><td><p>SKIP_METADATA</p></td>
<td><p>No</p></td>
<td><p>If true, skips optional embedded metadata in the file schema.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'skip_metadata'</span></code></p></td>
<td><p>true</p></td>
</tr>
-<tr class="row-odd"><td><p>METADATA_SIZE_HINT</p></td>
+<tr class="row-even"><td><p>METADATA_SIZE_HINT</p></td>
<td><p>No</p></td>
<td><p>Sets the size hint (in bytes) for fetching Parquet file
metadata.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'metadata_size_hint'</span></code></p></td>
<td><p>None</p></td>
</tr>
-<tr class="row-even"><td><p>PUSHDOWN_FILTERS</p></td>
+<tr class="row-odd"><td><p>PUSHDOWN_FILTERS</p></td>
<td><p>No</p></td>
<td><p>If true, enables filter pushdown during Parquet decoding.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'pushdown_filters'</span></code></p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-odd"><td><p>REORDER_FILTERS</p></td>
+<tr class="row-even"><td><p>REORDER_FILTERS</p></td>
<td><p>No</p></td>
<td><p>If true, enables heuristic reordering of filters during Parquet
decoding.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'reorder_filters'</span></code></p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-even"><td><p>SCHEMA_FORCE_VIEW_TYPES</p></td>
+<tr class="row-odd"><td><p>SCHEMA_FORCE_VIEW_TYPES</p></td>
<td><p>No</p></td>
<td><p>If true, reads Utf8/Binary columns as view types.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'schema_force_view_types'</span></code></p></td>
<td><p>true</p></td>
</tr>
-<tr class="row-odd"><td><p>BINARY_AS_STRING</p></td>
+<tr class="row-even"><td><p>BINARY_AS_STRING</p></td>
<td><p>No</p></td>
<td><p>If true, reads Binary columns as strings.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'binary_as_string'</span></code></p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-even"><td><p>DATA_PAGESIZE_LIMIT</p></td>
+<tr class="row-odd"><td><p>DATA_PAGESIZE_LIMIT</p></td>
<td><p>No</p></td>
<td><p>Sets best effort maximum size of data page in bytes.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'data_pagesize_limit'</span></code></p></td>
<td><p>1048576</p></td>
</tr>
-<tr class="row-odd"><td><p>DATA_PAGE_ROW_COUNT_LIMIT</p></td>
+<tr class="row-even"><td><p>DATA_PAGE_ROW_COUNT_LIMIT</p></td>
<td><p>No</p></td>
<td><p>Sets best effort maximum number of rows in data page.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'data_page_row_count_limit'</span></code></p></td>
<td><p>20000</p></td>
</tr>
-<tr class="row-even"><td><p>DICTIONARY_PAGE_SIZE_LIMIT</p></td>
+<tr class="row-odd"><td><p>DICTIONARY_PAGE_SIZE_LIMIT</p></td>
<td><p>No</p></td>
<td><p>Sets best effort maximum dictionary page size, in bytes.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'dictionary_page_size_limit'</span></code></p></td>
<td><p>1048576</p></td>
</tr>
-<tr class="row-odd"><td><p>WRITE_BATCH_SIZE</p></td>
+<tr class="row-even"><td><p>WRITE_BATCH_SIZE</p></td>
<td><p>No</p></td>
<td><p>Sets write_batch_size in rows.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'write_batch_size'</span></code></p></td>
<td><p>1024</p></td>
</tr>
-<tr class="row-even"><td><p>WRITER_VERSION</p></td>
+<tr class="row-odd"><td><p>WRITER_VERSION</p></td>
<td><p>No</p></td>
<td><p>Sets the Parquet writer version (<code class="docutils literal
notranslate"><span class="pre">1.0</span></code> or <code class="docutils
literal notranslate"><span class="pre">2.0</span></code>).</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'writer_version'</span></code></p></td>
<td><p>1.0</p></td>
</tr>
-<tr class="row-odd"><td><p>SKIP_ARROW_METADATA</p></td>
+<tr class="row-even"><td><p>SKIP_ARROW_METADATA</p></td>
<td><p>No</p></td>
<td><p>If true, skips writing Arrow schema information into the Parquet file
metadata.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'skip_arrow_metadata'</span></code></p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-even"><td><p>CREATED_BY</p></td>
+<tr class="row-odd"><td><p>CREATED_BY</p></td>
<td><p>No</p></td>
<td><p>Sets the “created by” string in the Parquet file metadata.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'created_by'</span></code></p></td>
<td><p>datafusion version X.Y.Z</p></td>
</tr>
-<tr class="row-odd"><td><p>COLUMN_INDEX_TRUNCATE_LENGTH</p></td>
+<tr class="row-even"><td><p>COLUMN_INDEX_TRUNCATE_LENGTH</p></td>
<td><p>No</p></td>
<td><p>Sets the length (in bytes) to truncate min/max values in column
indexes.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'column_index_truncate_length'</span></code></p></td>
<td><p>64</p></td>
</tr>
-<tr class="row-even"><td><p>STATISTICS_TRUNCATE_LENGTH</p></td>
+<tr class="row-odd"><td><p>STATISTICS_TRUNCATE_LENGTH</p></td>
<td><p>No</p></td>
<td><p>Sets statistics truncate length.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'statistics_truncate_length'</span></code></p></td>
<td><p>None</p></td>
</tr>
-<tr class="row-odd"><td><p>BLOOM_FILTER_ON_WRITE</p></td>
+<tr class="row-even"><td><p>BLOOM_FILTER_ON_WRITE</p></td>
<td><p>No</p></td>
<td><p>Sets whether bloom filters should be written for all columns by default
(can be overridden per column).</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'bloom_filter_on_write'</span></code></p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-even"><td><p>ALLOW_SINGLE_FILE_PARALLELISM</p></td>
+<tr class="row-odd"><td><p>ALLOW_SINGLE_FILE_PARALLELISM</p></td>
<td><p>No</p></td>
<td><p>Enables parallel serialization of columns in a single file.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'allow_single_file_parallelism'</span></code></p></td>
<td><p>true</p></td>
</tr>
-<tr class="row-odd"><td><p>MAXIMUM_PARALLEL_ROW_GROUP_WRITERS</p></td>
+<tr class="row-even"><td><p>MAXIMUM_PARALLEL_ROW_GROUP_WRITERS</p></td>
<td><p>No</p></td>
<td><p>Maximum number of parallel row group writers.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'maximum_parallel_row_group_writers'</span></code></p></td>
<td><p>1</p></td>
</tr>
-<tr class="row-even"><td><p>MAXIMUM_BUFFERED_RECORD_BATCHES_PER_STREAM</p></td>
+<tr class="row-odd"><td><p>MAXIMUM_BUFFERED_RECORD_BATCHES_PER_STREAM</p></td>
<td><p>No</p></td>
<td><p>Maximum number of buffered record batches per stream.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'maximum_buffered_record_batches_per_stream'</span></code></p></td>
<td><p>2</p></td>
</tr>
-<tr class="row-odd"><td><p>KEY_VALUE_METADATA</p></td>
+<tr class="row-even"><td><p>KEY_VALUE_METADATA</p></td>
<td><p>No (Key is specific)</p></td>
<td><p>Adds custom key-value pairs to the file metadata. Use the format <code
class="docutils literal notranslate"><span
class="pre">'metadata::your_key_name'</span> <span
class="pre">'your_value'</span></code>. Multiple entries allowed.</p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">'metadata::key_name'</span></code></p></td>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]