This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 2aea8803c0 Publish built docs triggered by
38d5f75de45ae3a7e1602456da4f86e127ed319f
2aea8803c0 is described below
commit 2aea8803c08e4334a7434a9c278cffc382ff0af9
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Mon Jan 22 18:47:41 2024 +0000
Publish built docs triggered by 38d5f75de45ae3a7e1602456da4f86e127ed319f
---
_sources/user-guide/configs.md.txt | 2 +-
user-guide/configs.html | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/_sources/user-guide/configs.md.txt
b/_sources/user-guide/configs.md.txt
index 9d914aaaf1..8b039102d4 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -71,7 +71,7 @@ Environment variables are read during `SessionConfig`
initialisation so they mus
| datafusion.execution.parquet.bloom_filter_enabled |
false | Sets if bloom filter is enabled for any column
[...]
| datafusion.execution.parquet.bloom_filter_fpp |
NULL | Sets bloom filter false positive probability. If
NULL, uses default parquet writer setting
[...]
| datafusion.execution.parquet.bloom_filter_ndv |
NULL | Sets bloom filter number of distinct values. If
NULL, uses default parquet writer setting
[...]
-| datafusion.execution.parquet.allow_single_file_parallelism |
false | Controls whether DataFusion will attempt to speed
up writing parquet files by serializing them in parallel. Each column in each
row group in each output file are serialized in parallel leveraging a maximum
possible core count of n_files*n_row_groups*n_columns.
[...]
+| datafusion.execution.parquet.allow_single_file_parallelism |
true | Controls whether DataFusion will attempt to speed
up writing parquet files by serializing them in parallel. Each column in each
row group in each output file are serialized in parallel leveraging a maximum
possible core count of n_files*n_row_groups*n_columns.
[...]
| datafusion.execution.parquet.maximum_parallel_row_group_writers | 1
| By default parallel parquet writer is tuned for
minimum memory usage in a streaming execution plan. You may see a performance
benefit when writing large parquet files by increasing
maximum_parallel_row_group_writers and
maximum_buffered_record_batches_per_stream if your system has idle cores and
can tolerate additional memory usage. Boosting these values is likely
worthwhile when writi [...]
| datafusion.execution.parquet.maximum_buffered_record_batches_per_stream | 2
| By default parallel parquet writer is tuned for
minimum memory usage in a streaming execution plan. You may see a performance
benefit when writing large parquet files by increasing
maximum_parallel_row_group_writers and
maximum_buffered_record_batches_per_stream if your system has idle cores and
can tolerate additional memory usage. Boosting these values is likely
worthwhile when writi [...]
| datafusion.execution.aggregate.scalar_update_factor | 10
| Specifies the threshold for using `ScalarValue`s to
update accumulators during high-cardinality aggregations for each input batch.
The aggregation is considered high-cardinality if the number of affected groups
is greater than or equal to `batch_size / scalar_update_factor`. In such cases,
`ScalarValue`s are utilized for updating accumulators, rather than the default
batch-slice approa [...]
diff --git a/user-guide/configs.html b/user-guide/configs.html
index 585867937e..9303f0d450 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -553,7 +553,7 @@ Environment variables are read during <code class="docutils
literal notranslate"
<td><p>Sets bloom filter number of distinct values. If NULL, uses default
parquet writer setting</p></td>
</tr>
<tr
class="row-even"><td><p>datafusion.execution.parquet.allow_single_file_parallelism</p></td>
-<td><p>false</p></td>
+<td><p>true</p></td>
<td><p>Controls whether DataFusion will attempt to speed up writing parquet
files by serializing them in parallel. Each column in each row group in each
output file are serialized in parallel leveraging a maximum possible core count
of n_files<em>n_row_groups</em>n_columns.</p></td>
</tr>
<tr
class="row-odd"><td><p>datafusion.execution.parquet.maximum_parallel_row_group_writers</p></td>