(arrow-datafusion) branch asf-site updated: Publish built docs triggered by 38d5f75de45ae3a7e1602456da4f86e127ed319f

github-bot Mon, 22 Jan 2024 10:48:01 -0800

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 2aea8803c0 Publish built docs triggered by 
38d5f75de45ae3a7e1602456da4f86e127ed319f
2aea8803c0 is described below

commit 2aea8803c08e4334a7434a9c278cffc382ff0af9
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Mon Jan 22 18:47:41 2024 +0000

    Publish built docs triggered by 38d5f75de45ae3a7e1602456da4f86e127ed319f
---
 _sources/user-guide/configs.md.txt | 2 +-
 user-guide/configs.html            | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/_sources/user-guide/configs.md.txt 
b/_sources/user-guide/configs.md.txt
index 9d914aaaf1..8b039102d4 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -71,7 +71,7 @@ Environment variables are read during `SessionConfig` 
initialisation so they mus
 | datafusion.execution.parquet.bloom_filter_enabled                       | 
false                     | Sets if bloom filter is enabled for any column      
                                                                                
                                                                                
                                                                                
                                                                                
                 [...]
 | datafusion.execution.parquet.bloom_filter_fpp                           | 
NULL                      | Sets bloom filter false positive probability. If 
NULL, uses default parquet writer setting                                       
                                                                                
                                                                                
                                                                                
                    [...]
 | datafusion.execution.parquet.bloom_filter_ndv                           | 
NULL                      | Sets bloom filter number of distinct values. If 
NULL, uses default parquet writer setting                                       
                                                                                
                                                                                
                                                                                
                     [...]
-| datafusion.execution.parquet.allow_single_file_parallelism              | 
false                     | Controls whether DataFusion will attempt to speed 
up writing parquet files by serializing them in parallel. Each column in each 
row group in each output file are serialized in parallel leveraging a maximum 
possible core count of n_files*n_row_groups*n_columns.                          
                                                                                
                       [...]
+| datafusion.execution.parquet.allow_single_file_parallelism              | 
true                      | Controls whether DataFusion will attempt to speed 
up writing parquet files by serializing them in parallel. Each column in each 
row group in each output file are serialized in parallel leveraging a maximum 
possible core count of n_files*n_row_groups*n_columns.                          
                                                                                
                       [...]
 | datafusion.execution.parquet.maximum_parallel_row_group_writers         | 1  
                       | By default parallel parquet writer is tuned for 
minimum memory usage in a streaming execution plan. You may see a performance 
benefit when writing large parquet files by increasing 
maximum_parallel_row_group_writers and 
maximum_buffered_record_batches_per_stream if your system has idle cores and 
can tolerate additional memory usage. Boosting these values is likely 
worthwhile when writi [...]
 | datafusion.execution.parquet.maximum_buffered_record_batches_per_stream | 2  
                       | By default parallel parquet writer is tuned for 
minimum memory usage in a streaming execution plan. You may see a performance 
benefit when writing large parquet files by increasing 
maximum_parallel_row_group_writers and 
maximum_buffered_record_batches_per_stream if your system has idle cores and 
can tolerate additional memory usage. Boosting these values is likely 
worthwhile when writi [...]
 | datafusion.execution.aggregate.scalar_update_factor                     | 10 
                       | Specifies the threshold for using `ScalarValue`s to 
update accumulators during high-cardinality aggregations for each input batch. 
The aggregation is considered high-cardinality if the number of affected groups 
is greater than or equal to `batch_size / scalar_update_factor`. In such cases, 
`ScalarValue`s are utilized for updating accumulators, rather than the default 
batch-slice approa [...]
diff --git a/user-guide/configs.html b/user-guide/configs.html
index 585867937e..9303f0d450 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -553,7 +553,7 @@ Environment variables are read during <code class="docutils 
literal notranslate"
 <td><p>Sets bloom filter number of distinct values. If NULL, uses default 
parquet writer setting</p></td>
 </tr>
 <tr 
class="row-even"><td><p>datafusion.execution.parquet.allow_single_file_parallelism</p></td>
-<td><p>false</p></td>
+<td><p>true</p></td>
 <td><p>Controls whether DataFusion will attempt to speed up writing parquet 
files by serializing them in parallel. Each column in each row group in each 
output file are serialized in parallel leveraging a maximum possible core count 
of n_files<em>n_row_groups</em>n_columns.</p></td>
 </tr>
 <tr 
class="row-odd"><td><p>datafusion.execution.parquet.maximum_parallel_row_group_writers</p></td>

(arrow-datafusion) branch asf-site updated: Publish built docs triggered by 38d5f75de45ae3a7e1602456da4f86e127ed319f

Reply via email to