This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/asf-site by this push:
new d39a36ec5b Publish built docs triggered by
87e931c976a7aa24cecaa9bf3658b42bba12a51e
d39a36ec5b is described below
commit d39a36ec5b37b170a3e2d7338382247f01a91163
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Fri Oct 18 11:35:17 2024 +0000
Publish built docs triggered by 87e931c976a7aa24cecaa9bf3658b42bba12a51e
---
_sources/user-guide/configs.md.txt | 1 +
searchindex.js | 2 +-
user-guide/configs.html | 66 ++++++++++++++++++++------------------
3 files changed, 37 insertions(+), 32 deletions(-)
diff --git a/_sources/user-guide/configs.md.txt
b/_sources/user-guide/configs.md.txt
index f34d148f09..c61a7b6733 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -91,6 +91,7 @@ Environment variables are read during `SessionConfig`
initialisation so they mus
| datafusion.execution.skip_partial_aggregation_probe_ratio_threshold |
0.8 | Aggregation ratio (number of distinct groups /
number of input rows) threshold for skipping partial aggregation. If the value
is greater then partial aggregation will skip aggregation for further input
[...]
| datafusion.execution.skip_partial_aggregation_probe_rows_threshold |
100000 | Number of input rows partial aggregation partition
should process, before aggregation ratio check and trying to switch to skipping
aggregation mode
[...]
| datafusion.execution.use_row_number_estimates_to_optimize_partitioning |
false | Should DataFusion use row number estimates at the
input to decide whether increasing parallelism is beneficial or not. By
default, only exact row numbers (not estimates) are used for this decision.
Setting this flag to `true` will likely produce better plans. if the source of
statistics is accurate. We plan to make this the default in the future.
[...]
+| datafusion.execution.enforce_batch_size_in_joins |
false | Should DataFusion enforce batch size in joins or
not. By default, DataFusion will not enforce batch size in joins. Enforcing
batch size in joins can reduce memory usage when joining large tables with a
highly-selective join filter, but is also slightly slower.
[...]
| datafusion.optimizer.enable_distinct_aggregation_soft_limit |
true | When set to true, the optimizer will push a limit
operation into grouped aggregations which have no aggregate expressions, as a
soft limit, emitting groups once the limit is reached, before all rows in the
group are read.
[...]
| datafusion.optimizer.enable_round_robin_repartition |
true | When set to true, the physical plan optimizer will
try to add round robin repartitioning to increase parallelism to leverage more
CPU cores
[...]
| datafusion.optimizer.enable_topk_aggregation |
true | When set to true, the optimizer will attempt to
perform limit operations during aggregations, if possible
[...]
diff --git a/searchindex.js b/searchindex.js
index 476ac73884..12e0b278b7 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"!=": [[47, "op-neq"]], "!~": [[47,
"op-re-not-match"]], "!~*": [[47, "op-re-not-match-i"]], "!~~": [[47, "id18"]],
"!~~*": [[47, "id19"]], "#": [[47, "op-bit-xor"]], "%": [[47, "op-modulo"]],
"&": [[47, "op-bit-and"]], "(relation, name) tuples in logical fields and
logical columns are unique": [[9,
"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]], "*":
[[47, "op-multiply"]], "+": [[47, "op-plus"]], "-": [[47, "op-minus"]], "/":
[[4 [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"!=": [[47, "op-neq"]], "!~": [[47,
"op-re-not-match"]], "!~*": [[47, "op-re-not-match-i"]], "!~~": [[47, "id18"]],
"!~~*": [[47, "id19"]], "#": [[47, "op-bit-xor"]], "%": [[47, "op-modulo"]],
"&": [[47, "op-bit-and"]], "(relation, name) tuples in logical fields and
logical columns are unique": [[9,
"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]], "*":
[[47, "op-multiply"]], "+": [[47, "op-plus"]], "-": [[47, "op-minus"]], "/":
[[4 [...]
\ No newline at end of file
diff --git a/user-guide/configs.html b/user-guide/configs.html
index cd710950f0..15b4a0e800 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -783,127 +783,131 @@ Environment variables are read during <code
class="docutils literal notranslate"
<td><p>false</p></td>
<td><p>Should DataFusion use row number estimates at the input to decide
whether increasing parallelism is beneficial or not. By default, only exact row
numbers (not estimates) are used for this decision. Setting this flag to <code
class="docutils literal notranslate"><span class="pre">true</span></code> will
likely produce better plans. if the source of statistics is accurate. We plan
to make this the default in the future.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.enable_distinct_aggregation_soft_limit</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.enforce_batch_size_in_joins</p></td>
+<td><p>false</p></td>
+<td><p>Should DataFusion enforce batch size in joins or not. By default,
DataFusion will not enforce batch size in joins. Enforcing batch size in joins
can reduce memory usage when joining large tables with a highly-selective join
filter, but is also slightly slower.</p></td>
+</tr>
+<tr
class="row-odd"><td><p>datafusion.optimizer.enable_distinct_aggregation_soft_limit</p></td>
<td><p>true</p></td>
<td><p>When set to true, the optimizer will push a limit operation into
grouped aggregations which have no aggregate expressions, as a soft limit,
emitting groups once the limit is reached, before all rows in the group are
read.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.enable_round_robin_repartition</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.enable_round_robin_repartition</p></td>
<td><p>true</p></td>
<td><p>When set to true, the physical plan optimizer will try to add round
robin repartitioning to increase parallelism to leverage more CPU cores</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.enable_topk_aggregation</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.enable_topk_aggregation</p></td>
<td><p>true</p></td>
<td><p>When set to true, the optimizer will attempt to perform limit
operations during aggregations, if possible</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.filter_null_join_keys</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.filter_null_join_keys</p></td>
<td><p>false</p></td>
<td><p>When set to true, the optimizer will insert filters before a join
between a nullable and non-nullable column to filter out nulls on the nullable
side. This filter can add additional overhead when the file format does not
fully support predicate push down.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.repartition_aggregations</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.repartition_aggregations</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion repartition data using the aggregate keys to execute
aggregates in parallel using the provided <code class="docutils literal
notranslate"><span class="pre">target_partitions</span></code> level</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.repartition_file_min_size</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.repartition_file_min_size</p></td>
<td><p>10485760</p></td>
<td><p>Minimum total files size in bytes to perform file scan
repartitioning.</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.optimizer.repartition_joins</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.repartition_joins</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion repartition data using the join keys to execute joins
in parallel using the provided <code class="docutils literal notranslate"><span
class="pre">target_partitions</span></code> level</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.allow_symmetric_joins_without_pruning</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.allow_symmetric_joins_without_pruning</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion allow symmetric hash joins for unbounded data sources
even when its inputs do not have any ordering or filtering If the flag is not
enabled, the SymmetricHashJoin operator will be unable to prune its internal
buffers, resulting in certain join types - such as Full, Left, LeftAnti,
LeftSemi, Right, RightAnti, and RightSemi - being produced only at the end of
the execution. This is not typical in stream processing. Additionally, without
proper design for long runne [...]
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.repartition_file_scans</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.repartition_file_scans</p></td>
<td><p>true</p></td>
<td><p>When set to <code class="docutils literal notranslate"><span
class="pre">true</span></code>, file groups will be repartitioned to achieve
maximum parallelism. Currently Parquet and CSV formats are supported. If set to
<code class="docutils literal notranslate"><span
class="pre">true</span></code>, all files will be repartitioned evenly (i.e., a
single large file might be partitioned into smaller chunks) for parallel
scanning. If set to <code class="docutils literal notranslate"><s [...]
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.repartition_windows</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.repartition_windows</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion repartition data using the partitions keys to execute
window functions in parallel using the provided <code class="docutils literal
notranslate"><span class="pre">target_partitions</span></code> level</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.optimizer.repartition_sorts</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.repartition_sorts</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion execute sorts in a per-partition fashion and merge
afterwards instead of coalescing first and sorting globally. With this flag is
enabled, plans in the form below <code class="docutils literal
notranslate"><span class="pre">text</span> <span
class="pre">"SortExec:</span> <span class="pre">[a@0</span> <span
class="pre">ASC]",</span> <span class="pre">"</span> <span
class="pre">CoalescePartitionsExec",</span> <span
class="pre">"</span> [...]
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.prefer_existing_sort</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.prefer_existing_sort</p></td>
<td><p>false</p></td>
<td><p>When true, DataFusion will opportunistically remove sorts when the data
is already sorted, (i.e. setting <code class="docutils literal
notranslate"><span class="pre">preserve_order</span></code> to true on <code
class="docutils literal notranslate"><span
class="pre">RepartitionExec</span></code> and using <code class="docutils
literal notranslate"><span class="pre">SortPreservingMergeExec</span></code>)
When false, DataFusion will maximize plan parallelism using <code class="docut
[...]
</tr>
-<tr class="row-even"><td><p>datafusion.optimizer.skip_failed_rules</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.skip_failed_rules</p></td>
<td><p>false</p></td>
<td><p>When set to true, the logical plan optimizer will produce warning
messages if any optimization rules produce errors and then proceed to the next
rule. When set to false, any rules that produce errors will cause the query to
fail</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.max_passes</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.max_passes</p></td>
<td><p>3</p></td>
<td><p>Number of times that the optimizer will attempt to optimize the
plan</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.top_down_join_key_reordering</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.top_down_join_key_reordering</p></td>
<td><p>true</p></td>
<td><p>When set to true, the physical plan optimizer will run a top down
process to reorder the join keys</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.prefer_hash_join</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.prefer_hash_join</p></td>
<td><p>true</p></td>
<td><p>When set to true, the physical plan optimizer will prefer HashJoin over
SortMergeJoin. HashJoin can work more efficiently than SortMergeJoin but
consumes more memory</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.hash_join_single_partition_threshold</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.hash_join_single_partition_threshold</p></td>
<td><p>1048576</p></td>
<td><p>The maximum estimated size in bytes for one input side of a HashJoin
will be collected into a single partition</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.hash_join_single_partition_threshold_rows</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.hash_join_single_partition_threshold_rows</p></td>
<td><p>131072</p></td>
<td><p>The maximum estimated size in rows for one input side of a HashJoin
will be collected into a single partition</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.default_filter_selectivity</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.default_filter_selectivity</p></td>
<td><p>20</p></td>
<td><p>The default filter selectivity used by Filter Statistics when an exact
selectivity cannot be determined. Valid values are between 0 (no selectivity)
and 100 (all rows are selected).</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.prefer_existing_union</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.prefer_existing_union</p></td>
<td><p>false</p></td>
<td><p>When set to true, the optimizer will not attempt to convert Union to
Interleave</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.expand_views_at_output</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.expand_views_at_output</p></td>
<td><p>false</p></td>
<td><p>When set to true, if the returned type is a view type then the output
will be coerced to a non-view. Coerces <code class="docutils literal
notranslate"><span class="pre">Utf8View</span></code> to <code class="docutils
literal notranslate"><span class="pre">LargeUtf8</span></code>, and <code
class="docutils literal notranslate"><span class="pre">BinaryView</span></code>
to <code class="docutils literal notranslate"><span
class="pre">LargeBinary</span></code>.</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.explain.logical_plan_only</p></td>
+<tr class="row-even"><td><p>datafusion.explain.logical_plan_only</p></td>
<td><p>false</p></td>
<td><p>When set to true, the explain statement will only print logical
plans</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.explain.physical_plan_only</p></td>
+<tr class="row-odd"><td><p>datafusion.explain.physical_plan_only</p></td>
<td><p>false</p></td>
<td><p>When set to true, the explain statement will only print physical
plans</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.explain.show_statistics</p></td>
+<tr class="row-even"><td><p>datafusion.explain.show_statistics</p></td>
<td><p>false</p></td>
<td><p>When set to true, the explain statement will print operator statistics
for physical plans</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.explain.show_sizes</p></td>
+<tr class="row-odd"><td><p>datafusion.explain.show_sizes</p></td>
<td><p>true</p></td>
<td><p>When set to true, the explain statement will print the partition
sizes</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.explain.show_schema</p></td>
+<tr class="row-even"><td><p>datafusion.explain.show_schema</p></td>
<td><p>false</p></td>
<td><p>When set to true, the explain statement will print schema
information</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.sql_parser.parse_float_as_decimal</p></td>
+<tr
class="row-odd"><td><p>datafusion.sql_parser.parse_float_as_decimal</p></td>
<td><p>false</p></td>
<td><p>When set to true, SQL parser will parse float as decimal type</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.sql_parser.enable_ident_normalization</p></td>
+<tr
class="row-even"><td><p>datafusion.sql_parser.enable_ident_normalization</p></td>
<td><p>true</p></td>
<td><p>When set to true, SQL parser will normalize ident (convert ident to
lowercase when not quoted)</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.sql_parser.enable_options_value_normalization</p></td>
+<tr
class="row-odd"><td><p>datafusion.sql_parser.enable_options_value_normalization</p></td>
<td><p>true</p></td>
<td><p>When set to true, SQL parser will normalize options value (convert
value to lowercase)</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.sql_parser.dialect</p></td>
+<tr class="row-even"><td><p>datafusion.sql_parser.dialect</p></td>
<td><p>generic</p></td>
<td><p>Configure the SQL dialect used by DataFusion’s parser; supported values
include: Generic, MySQL, PostgreSQL, Hive, SQLite, Snowflake, Redshift, MsSQL,
ClickHouse, BigQuery, and Ansi.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.sql_parser.support_varchar_with_length</p></td>
+<tr
class="row-odd"><td><p>datafusion.sql_parser.support_varchar_with_length</p></td>
<td><p>true</p></td>
<td><p>If true, permit lengths for <code class="docutils literal
notranslate"><span class="pre">VARCHAR</span></code> such as <code
class="docutils literal notranslate"><span
class="pre">VARCHAR(20)</span></code>, but ignore the length. If false, error
if a <code class="docutils literal notranslate"><span
class="pre">VARCHAR</span></code> with a length is specified. The Arrow type
system does not have a notion of maximum string length and thus DataFusion can
not enforce such limits.</p></td>
</tr>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]