haohuaijin commented on issue #10140:
URL: https://github.com/apache/arrow-rs/issues/10140#issuecomment-4726551760

   the repo is in https://github.com/haohuaijin/rle-bitmap-compare
   
   40 files × 5 M rows = 200 M total, ZSTD parquet, narrow-projection top-K 
SQL: 
   ```sql
   SELECT
     service_name,
     COUNT(*)
   FROM
     t
   WHERE
     body_<pat> ILIKE '%SVC-C3500%'
   GROUP BY
     service_name ORDER BY cnt LIMIT 10
   ```
   One strategy per process under `/usr/bin/time -l`; `peak heap` is `peak 
memory footprint`.
   Three strategies:
   - `rle`      — `ParquetAccessPlan` from `Vec<RowSelector>` (today)
   - `bitmap`   — `ParquetAccessPlan` from a `BooleanBuffer` via the API this 
issue proposes
   - `no-index` — keep `body_<pat> ILIKE ...`, `pushdown_filters=false`
   
   ### `--pattern fragmented` — hit every 3rd row → `read 1 / skip 2`
   
   | strategy   | selectors   | exec mean | peak heap |
   |------------|------------:|----------:|----------:|
   | `rle`      | 133,333,854 |   1441 ms |   7.95 GB |
   | `bitmap`   | —           |    481 ms |    201 MB |
   | `no-index` | —           |   4782 ms |    269 MB |
   
   ### `--pattern clustered` — jittered 8 K-row runs, the case Parquet's happy 
with
   
   | strategy   | selectors | exec mean | peak heap |
   |------------|----------:|----------:|----------:|
   | `rle`      |    17,834 |    145 ms |    162 MB |
   | `bitmap`   | —         |    246 ms |    193 MB |
   | `no-index` | —         |   4530 ms |    260 MB |
   
   in the fragmented case bitmap is **3× faster** (481 vs 1441 ms) and uses 
**40× less memory** (201 MB vs 7.95 GB). It's RLE that pays the per-row cost — 
133 M selectors processed one by one; bitmap just hands the decoder one buffer 
per row group.
   
   For clustered runs the per-run overhead amortizes over 8 K rows, so RLE wins 
narrowly (1.7× faster, ~17% less memory). Bitmap stays within ~2× — fine as a 
default.
   
   `no-index` is the floor without an access plan — body still has to be 
decoded for the ILIKE, so it's 3–30× slower than the indexed paths. Without 
bitmap, fragmented data forces a choice between 7.95 GB of RLE or 5 s of 
no-index.
   
   500 M rows (`--files 100`) shows the RLE cliff: fragmented RLE goes 
superlinear (17.10 GB / 5.85 s, 4× exec for 2.5× data); bitmap stays linear 
(272 MB / 1.29 s). you can check 500M result in 
https://github.com/haohuaijin/rle-bitmap-compare
   
   cc @2010YOUY01 @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to