On 3/18/24 16:55, Tomas Vondra wrote:
>
> ...
> 
> OK, I've restarted the tests for only 0012 and 0014 patches, and I'll
> wait for these to complete - I don't want to be looking for patterns
> until we have enough data to smooth this out.
> 
>

I now have results for 1M and 10M runs on the two builds (0012 and
0014), attached is a chart for relative performance plotting

  (0014 timing) / (0012 timing)

for "optimal' runs that would pick bitmapscan on their own. There's
nothing special about the config - I reduced the random_page_cost to
1.5-2.0 to reflect both machines have flash storage, etc.

Overall, the chart is pretty consistent with what I shared on Sunday.
Most of the results are fine (0014 is close to 0012 or faster), but
there's a bunch of cases that are much slower. Interestingly enough,
almost all of them are on the i5 machine, almost none of the xeon. My
guess is this is about the SSD type (SATA vs. NVMe).

Attached if table of ~50 worst regressions (by the metric above), and
it's interesting the worst regressions are with eic=0 and eic=1.

I decided to look at the first case (eic=0), and the timings are quite
stable - there are three runs for each build, with timings close to the
average (see below the table).

Attached is a script that reproduces this on both machines, but the
difference is much more significant on i5 (~5x) compared to xeon (~2x).

I haven't investigated what exactly is happening and why, hopefully the
script will allow you to reproduce this independently. I plan to take a
look, but I don't know when I'll have time for this.

FWIW if the script does not reproduce this on your machines, I might be
able to give you access to the i5 machine. Let me know.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
create table test_table (a bigint, b bigint, c text) with (fillfactor = 25);
insert into test_table select a, b, c from (select a, b, c, 
generate_series(1,24) from (select a, b, c from (select (416 * random())::int 
a, i b, md5(random()::text) c from generate_series(1, 1000000/24) s(i)) foo) 
bar) baz;
create index on test_table (a);
vacuum analyze;
checkpoint;


-- 0012
set effective_io_concurrency = 0;
set work_mem = '4MB';
explain analyze select * from test_table where (a >= 53) AND (a <= 264);

                                                                QUERY PLAN      
                                                           
-------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_table  (cost=6030.91..55240.07 rows=502877 width=49) 
(actual time=21.821..406.664 rows=508440 loops=1)
   Recheck Cond: ((a >= 53) AND (a <= 264))
   Heap Blocks: exact=21185
   ->  Bitmap Index Scan on test_table_a_idx  (cost=0.00..5905.20 rows=502877 
width=0) (actual time=18.599..18.599 rows=508440 loops=1)
         Index Cond: ((a >= 53) AND (a <= 264))
 Planning Time: 5.235 ms
 Execution Time: 421.185 ms
(7 rows)


-- 0014
set effective_io_concurrency = 0;
set work_mem = '4MB';
explain analyze select * from test_table where (a >= 53) AND (a <= 264);

                                                                QUERY PLAN      
                                                           
-------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_table  (cost=6030.91..55240.07 rows=502877 width=49) 
(actual time=21.894..2697.074 rows=508440 loops=1)
   Recheck Cond: ((a >= 53) AND (a <= 264))
   Heap Blocks: exact=21185
   ->  Bitmap Index Scan on test_table_a_idx  (cost=0.00..5905.20 rows=502877 
width=0) (actual time=18.714..18.715 rows=508440 loops=1)
         Index Cond: ((a >= 53) AND (a <= 264))
 Planning Time: 4.943 ms
 Execution Time: 2730.409 ms
(7 rows)
with data as (
  select machine, build, rows, dataset, workers, wm, eic, matches, caching, 
count(*), round(avg(timing),2) as timing
    from results where optimal = 'bitmapscan'
   group by 1, 2, 3, 4, 5, 6, 7, 8, 9
)
select d1.*, d2.timing as timing_0014, round(d2.timing / d1.timing,2) AS change
from data d1
join data d2 on ((d1.machine, d1.rows, d1.dataset, d1.workers, d1.wm, d1.eic, 
d1.matches, d1.caching) = (d2.machine, d2.rows, d2.dataset, d2.workers, d2.wm, 
d2.eic, d2.matches, d2.caching))
where d1.build = 'patched-0012'
  and d2.build = 'patched-0014'
order by d2.timing / d1.timing desc;

 machine |    build     |   rows   |    dataset    | workers |  wm   | eic | 
matches |  caching  | count |  timing  | timing_0014 | change 
---------+--------------+----------+---------------+---------+-------+-----+---------+-----------+-------+----------+-------------+--------
 i5      | patched-0012 |  1000000 | uniform_pages |       0 |  4096 |   0 |    
 211 | uncached  |     3 |   449.78 |     2650.20 |   5.89
 i5      | patched-0012 |  1000000 | uniform_pages |       0 | 65536 |   0 |    
 211 | uncached  |     3 |   440.61 |     2575.66 |   5.85
 i5      | patched-0012 | 10000000 | uniform       |       0 |   128 |   0 |    
4096 | uncached  |     3 |  3962.57 |    21042.76 |   5.31
 i5      | patched-0012 | 10000000 | uniform       |       0 |  4096 |   0 |    
4096 | uncached  |     3 |  3896.75 |    20501.01 |   5.26
 i5      | patched-0012 |  1000000 | uniform       |       0 | 65536 |   0 |    
 256 | uncached  |     3 |   451.27 |     2302.76 |   5.10
 i5      | patched-0012 |  1000000 | uniform       |       0 |  4096 |   0 |    
 256 | uncached  |     3 |   472.59 |     2364.94 |   5.00
 i5      | patched-0012 | 10000000 | uniform_pages |       0 | 65536 |   0 |    
1857 | uncached  |     3 |  4799.53 |    22771.77 |   4.74
 i5      | patched-0012 | 10000000 | uniform       |       0 | 65536 |   0 |    
4096 | uncached  |     3 |  3773.63 |    17691.96 |   4.69
 i5      | patched-0012 | 10000000 | uniform       |       4 | 65536 |   1 |   
16384 | uncached  |     3 |  3785.72 |    17514.72 |   4.63
 i5      | patched-0012 | 10000000 | uniform       |       0 | 65536 |   0 |    
2048 | uncached  |     3 |  4845.45 |    20687.09 |   4.27
 i5      | patched-0012 |  1000000 | uniform       |       4 |  4096 |   0 |    
 256 | uncached  |     3 |   445.31 |     1807.80 |   4.06
 i5      | patched-0012 | 10000000 | uniform       |       4 | 65536 |   1 |    
8192 | uncached  |     3 |  4101.52 |    16243.37 |   3.96
 i5      | patched-0012 |  1000000 | uniform       |       4 | 65536 |   1 |    
1024 | uncached  |     3 |   428.64 |     1669.27 |   3.89
 i5      | patched-0012 |  1000000 | uniform       |       4 |  4096 |   1 |    
2048 | uncached  |     3 |   390.48 |     1509.55 |   3.87
 i5      | patched-0012 |  1000000 | uniform       |       4 | 65536 |   1 |    
2048 | uncached  |     3 |   429.78 |     1615.26 |   3.76
 i5      | patched-0012 |  1000000 | uniform       |       4 | 65536 |   0 |    
 256 | uncached  |     3 |   476.00 |     1710.61 |   3.59
 i5      | patched-0012 |  1000000 | uniform       |       4 |  4096 |   1 |    
1024 | uncached  |     3 |   441.87 |     1559.04 |   3.53
 i5      | patched-0012 |  1000000 | cyclic_fuzz   |       0 | 65536 |  32 |    
   4 | uncached  |     3 |    58.85 |      183.76 |   3.12
 i5      | patched-0012 |  1000000 | cyclic_fuzz   |       0 |   128 |  32 |    
   4 | uncached  |     3 |    59.39 |      184.68 |   3.11
 i5      | patched-0012 |  1000000 | cyclic_fuzz   |       0 |  4096 |  32 |    
   4 | uncached  |     3 |    59.20 |      183.51 |   3.10
 i5      | patched-0012 |  1000000 | cyclic        |       0 | 65536 |  32 |    
   8 | uncached  |     3 |   105.09 |      313.22 |   2.98
 i5      | patched-0012 |  1000000 | cyclic        |       0 |  4096 |  32 |    
   8 | uncached  |     3 |   105.81 |      314.44 |   2.97
 i5      | patched-0012 |  1000000 | uniform_pages |       0 |  4096 |   0 |    
 128 | uncached  |     3 |   622.12 |     1818.29 |   2.92
 xeon    | patched-0012 |  1000000 | uniform_pages |       4 | 65536 |   1 |    
  16 | uncached  |     3 |    46.87 |      132.90 |   2.84
 i5      | patched-0012 |  1000000 | uniform_pages |       0 | 65536 |   0 |    
 128 | uncached  |     3 |   678.35 |     1884.47 |   2.78
 i5      | patched-0012 |  1000000 | cyclic_fuzz   |       0 |  4096 |  32 |    
   8 | uncached  |     3 |   111.51 |      308.55 |   2.77
 i5      | patched-0012 |  1000000 | cyclic_fuzz   |       0 | 65536 |  32 |    
   8 | uncached  |     3 |   111.19 |      306.88 |   2.76
 i5      | patched-0012 |  1000000 | cyclic        |       0 | 65536 |  16 |    
   8 | uncached  |     3 |   120.98 |      323.61 |   2.67
 i5      | patched-0012 |  1000000 | cyclic        |       0 |  4096 |  32 |    
   4 | uncached  |     3 |    50.19 |      130.33 |   2.60
 i5      | patched-0012 |  1000000 | cyclic        |       0 |  4096 |  16 |    
   8 | uncached  |     3 |   124.51 |      323.07 |   2.59
 i5      | patched-0012 |  1000000 | cyclic_fuzz   |       0 |   128 |  16 |    
   4 | uncached  |     3 |    75.20 |      190.16 |   2.53
 i5      | patched-0012 |  1000000 | cyclic        |       0 |   128 |  32 |    
   4 | uncached  |     3 |    51.18 |      127.96 |   2.50
 i5      | patched-0012 |  1000000 | cyclic_fuzz   |       0 |  4096 |  16 |    
   4 | uncached  |     3 |    74.08 |      184.83 |   2.50
 i5      | patched-0012 |  1000000 | cyclic_fuzz   |       0 | 65536 |  16 |    
   8 | uncached  |     3 |   128.53 |      317.58 |   2.47
 i5      | patched-0012 |  1000000 | cyclic        |       0 | 65536 |  32 |    
   4 | uncached  |     3 |    51.86 |      128.12 |   2.47
 i5      | patched-0012 |  1000000 | cyclic_fuzz   |       0 |  4096 |  16 |    
   8 | uncached  |     3 |   127.98 |      313.20 |   2.45
 i5      | patched-0012 |  1000000 | cyclic_fuzz   |       0 | 65536 |  16 |    
   4 | uncached  |     3 |    77.42 |      189.34 |   2.45
 i5      | patched-0012 |  1000000 | uniform       |       4 | 65536 |   0 |    
 128 | uncached  |     3 |   672.17 |     1601.18 |   2.38
 i5      | patched-0012 |  1000000 | uniform       |       4 |  4096 |   0 |    
 128 | uncached  |     3 |   690.68 |     1587.87 |   2.30
 i5      | patched-0012 |  1000000 | cyclic        |       0 | 65536 |   8 |    
   8 | uncached  |     3 |   146.23 |      317.30 |   2.17
 i5      | patched-0012 |  1000000 | cyclic        |       0 |  4096 |  16 |    
   4 | uncached  |     3 |    63.46 |      137.28 |   2.16
 i5      | patched-0012 |  1000000 | uniform       |       0 | 65536 |   0 |    
 128 | uncached  |     3 |   724.55 |     1562.10 |   2.16
 i5      | patched-0012 |  1000000 | cyclic        |       0 |  4096 |   8 |    
   8 | uncached  |     3 |   150.09 |      320.40 |   2.13
 i5      | patched-0012 |  1000000 | cyclic_fuzz   |       0 |   128 |   8 |    
   4 | uncached  |     3 |    96.47 |      205.49 |   2.13
 i5      | patched-0012 |  1000000 | cyclic_fuzz   |       0 | 65536 |   8 |    
   4 | uncached  |     3 |    96.30 |      204.45 |   2.12
 i5      | patched-0012 |  1000000 | uniform       |       0 |  4096 |   0 |    
 128 | uncached  |     3 |   739.61 |     1564.32 |   2.12
 xeon    | patched-0012 | 10000000 | uniform       |       0 | 65536 |   0 |    
4096 | uncached  |     3 |  6377.14 |    13482.09 |   2.11


test=# select * from results where machine = 'i5' and build = 'patched-0012' 
and rows = 1000000 and dataset = 'uniform_pages' and workers = 0 and wm = 4096 
and caching = 'uncached' and matches = 211 and eic = 0;
 machine |    build     |  rows   |    dataset    | relpages | workers |  wm  | 
eic | matches | ndistinct | run | caching  |  optimal   | timing  
---------+--------------+---------+---------------+----------+---------+------+-----+---------+-----------+-----+----------+------------+---------
 i5      | patched-0012 | 1000000 | uniform_pages |    41666 |       0 | 4096 | 
  0 |     211 |       416 |   1 | uncached | bitmapscan | 407.605
 i5      | patched-0012 | 1000000 | uniform_pages |    41666 |       0 | 4096 | 
  0 |     211 |       416 |   2 | uncached | bitmapscan | 494.448
 i5      | patched-0012 | 1000000 | uniform_pages |    41666 |       0 | 4096 | 
  0 |     211 |       416 |   3 | uncached | bitmapscan | 447.291
(3 rows)

test=# select * from results where machine = 'i5' and build = 'patched-0014' 
and rows = 1000000 and dataset = 'uniform_pages' and workers = 0 and wm = 4096 
and caching = 'uncached' and matches = 211 and eic = 0;
 machine |    build     |  rows   |    dataset    | relpages | workers |  wm  | 
eic | matches | ndistinct | run | caching  |  optimal   |  timing  
---------+--------------+---------+---------------+----------+---------+------+-----+---------+-----------+-----+----------+------------+----------
 i5      | patched-0014 | 1000000 | uniform_pages |    41666 |       0 | 4096 | 
  0 |     211 |       416 |   1 | uncached | bitmapscan | 2693.009
 i5      | patched-0014 | 1000000 | uniform_pages |    41666 |       0 | 4096 | 
  0 |     211 |       416 |   2 | uncached | bitmapscan | 2653.878
 i5      | patched-0014 | 1000000 | uniform_pages |    41666 |       0 | 4096 | 
  0 |     211 |       416 |   3 | uncached | bitmapscan | 2603.713
(3 rows)

Reply via email to