zhuqi-lucas commented on PR #21182:
URL: https://github.com/apache/datafusion/pull/21182#issuecomment-4190442300

   > > 8MB was too small for wide-row full scans (Q3: SELECT * with 16 columns),
   > > causing SPM to stall on I/O. 64MB per partition is still strictly less
   > > than the SortExec it replaces (which buffers entire partition in memory).
   > > BufferExec integrates with MemoryPool so it won't cause OOM.
   > 
   > I think this is right. Basically: `SortExec` is "unlimited" buffering. IMO 
we could go even higher if we have to a pick a number (although perhaps it 
should be configurable if it isn't already) - something like 512MB. If the 
partition is smaller it will never be hit. If it is larger or we run out of 
memory it will spill.
   > 
   > But let's see what the numbers look like with 64MB.
   
   ```rust
   Comparing HEAD and feat_sort-file-groups-by-statistics
   --------------------
   Benchmark sort_pushdown_sorted.json
   --------------------
   
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
   ┃ Query ┃                              HEAD ┃ 
feat_sort-file-groups-by-statistics ┃        Change ┃
   
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
   │ Q1    │ 158.16 / 158.64 ±0.38 / 159.16 ms │   121.48 / 123.31 ±1.20 / 
124.65 ms │ +1.29x faster │
   │ Q2    │    12.38 / 12.61 ±0.19 / 12.90 ms │         2.49 / 2.70 ±0.27 / 
3.23 ms │ +4.66x faster │
   │ Q3    │ 365.09 / 367.46 ±1.99 / 371.08 ms │   325.28 / 333.90 ±5.79 / 
342.48 ms │ +1.10x faster │
   │ Q4    │    53.62 / 54.45 ±1.07 / 56.52 ms │         5.54 / 5.97 ±0.58 / 
7.11 ms │ +9.12x faster │
   
└───────┴───────────────────────────────────┴─────────────────────────────────────┴───────────────┘
   ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
   ┃ Benchmark Summary                                  ┃          ┃
   ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
   │ Total Time (HEAD)                                  │ 593.17ms │
   │ Total Time (feat_sort-file-groups-by-statistics)   │ 465.89ms │
   │ Average Time (HEAD)                                │ 148.29ms │
   │ Average Time (feat_sort-file-groups-by-statistics) │ 116.47ms │
   │ Queries Faster                                     │        4 │
   │ Queries Slower                                     │        0 │
   │ Queries with No Change                             │        0 │
   │ Queries with Failure                               │        0 │
   └────────────────────────────────────────────────────┴──────────┘
   ```
   
   The result is amazing @adriangb !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to