mustafasrepo commented on issue #8572:
URL: 
https://github.com/apache/arrow-datafusion/issues/8572#issuecomment-1860316023

   At the time of the original PR, we did a benchmark for comparison with 
following results.
   
   <h1>PLAN V1</h1>
   <pre><code class="language-rust">&quot;SortPreservingMergeExec: [a@0 ASC 
NULLS LAST]&quot;,
   &quot;  SortPreservingRepartitionExec: partitioning=Hash([b@0], 16), 
input_partitions=16&quot;,
   &quot;    RepartitionExec: partitioning=Hash([a@0], 16), 
input_partitions=1&quot;,
   &quot;      MemoryExec: partitions=1, 
partition_sizes=[&lt;depends_on_batch_size&gt;], output_ordering: 
[PhysicalSortExpr { expr: Column { name: \\&quot;a\\&quot;, index: 0 }, 
options: SortOptions { descending: false, nulls_first: false } }]&quot;,
   </code></pre>
   <h1>PLAN V2</h1>
   <pre><code class="language-rust">&quot;SortPreservingMergeExec: [a@0 ASC 
NULLS LAST]&quot;,
   &quot;  SortExec: expr=[a@0 ASC NULLS LAST]&quot;,
   &quot;    RepartitionExec: partitioning=Hash([b@0], 16), 
input_partitions=16&quot;,
   &quot;      RepartitionExec: partitioning=Hash([a@0], 16), 
input_partitions=1&quot;,
   &quot;        MemoryExec: partitions=1, 
partition_sizes=[(&lt;depends_on_batch_size&gt;)], output_ordering: 
[PhysicalSortExpr { expr: Column { name: \\&quot;a\\&quot;, index: 0 }, 
options: SortOptions { descending: false, nulls_first: false } }]&quot;,
   </code></pre>
   
   n_elem | n_trial | batch_size | elapsed_mean(v1) | elapsed_median(v1) | 
elapsed_mean(v2) | elapsed_median(v2)
   -- | -- | -- | -- | -- | -- | --
   100000 | 10 | 25 | 581.791054ms | 581.777708ms | 1.112292862s | 1.117181625s
   100000 | 10 | 50 | 373.18927ms | 373.231708ms | 385.533966ms | 385.685291ms
   100000 | 10 | 100 | 240.91572ms | 240.570833ms | 239.308708ms | 239.303709ms
   100000 | 10 | 1000 | 115.471541ms | 113.817709ms | 100.713324ms | 
100.647333ms
   
   
   <!-- notionvc: 5eca7a10-df76-40ba-81f3-a0e27c688240 -->
   
   After these results, we decided to use existing default behavior (where V2 
is preferred over V1) to not hurt performance for others. Frankly, changing 
default would be better for our use cases. As @alamb suggests if there is 
concencus we can change the default. I wonder @ozankabak's opinion regarding 
this change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to