Re: [PR] Rename `bounded_order_preserving_variants` config to `prefer_exising_sort` and update docs [arrow-datafusion]

via GitHub Mon, 02 Oct 2023 09:44:52 -0700


viirya commented on code in PR #7723:
URL: https://github.com/apache/arrow-datafusion/pull/7723#discussion_r1342916873



##########
docs/source/user-guide/configs.md:
##########
@@ -87,7 +87,7 @@ Environment variables are read during `SessionConfig` 
initialisation so they mus
 | datafusion.optimizer.repartition_file_scans                | true            
          | When set to `true`, file groups will be repartitioned to achieve 
maximum parallelism. Currently Parquet and CSV formats are supported. If set to 
`true`, all files will be repartitioned evenly (i.e., a single large file might 
be partitioned into smaller chunks) for parallel scanning. If set to `false`, 
different files will be read in parallel, but repartitioning won't happen 
within a single file.                                                           
                                                                                
                                               |
 | datafusion.optimizer.repartition_windows                   | true            
          | Should DataFusion repartition data using the partitions keys to 
execute window functions in parallel using the provided `target_partitions` 
level                                                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                            |
 | datafusion.optimizer.repartition_sorts                     | true            
          | Should DataFusion execute sorts in a per-partition fashion and 
merge afterwards instead of coalescing first and sorting globally. With this 
flag is enabled, plans in the form below `text "SortExec: [a@0 ASC]", " 
CoalescePartitionsExec", " RepartitionExec: partitioning=RoundRobinBatch(8), 
input_partitions=1", ` would turn into the plan below which performs better in 
multithreaded environments `text "SortPreservingMergeExec: [a@0 ASC]", " 
SortExec: [a@0 ASC]", " RepartitionExec: partitioning=RoundRobinBatch(8), 
input_partitions=1", `                                               |
-| datafusion.optimizer.bounded_order_preserving_variants     | false           
          | When true, DataFusion will opportunistically remove sorts by 
replacing `RepartitionExec` with `SortPreservingRepartitionExec`, and 
`CoalescePartitionsExec` with `SortPreservingMergeExec`, even when the query is 
bounded.                                                                        
                                                                                
                                                                                
                                                                                
                                                     |
+| datafusion.optimizer.prefer_existing_sort                  | false           
          | When true, DataFusion will opportunistically remove sorts when the 
data is already sorted, replacing `RepartitionExec` with 
`SortPreservingRepartitionExec`, and `CoalescePartitionsExec` with 
`SortPreservingMergeExec`, When false, DataFusion will prefer to maximize the 
parallelism using `Repartition/Coalesce` and resort the data subsequently with 
`SortExec`                                                                      
                                                                                
                                                                            |

Review Comment:
   ```suggestion
   | datafusion.optimizer.prefer_existing_sort                  | false         
            | When true, DataFusion will opportunistically remove sorts when 
the data is already sorted, replacing `RepartitionExec` with 
`SortPreservingRepartitionExec` (i.e., `RepartitionExec` with `preserve_order` 
as true), and `CoalescePartitionsExec` with `SortPreservingMergeExec`. When 
false, DataFusion will prefer to maximize the parallelism using 
`Repartition/Coalesce` and resort the data subsequently with `SortExec`         
                                                                                
                                                                                
                                                         |
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Rename `bounded_order_preserving_variants` config to `prefer_exising_sort` and update docs [arrow-datafusion]

Reply via email to