On Mon, Jul 24, 2017 at 3:46 PM, Tom Lane <t...@sss.pgh.pa.us> wr > > > regression=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id > IN > :values_clause; > QUERY PLAN > ------------------------------------------------------------ > ----------------------------------------------------------------------- > Aggregate (cost=245006.16..245006.17 rows=1 width=8) (actual > time=3550.581..3550.581 rows=1 loops=1) > Execution time: 3550.700 ms > >
> > regression=# set enable_hashagg TO 0; > regression=# set enable_sort TO 0; > SET > regression=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id > IN > :values_clause; > QUERY PLAN > ------------------------------------------------------------ > ------------------------------------------------------------------- > Aggregate (cost=320003.90..320003.91 rows=1 width=8) (actual > time=3548.364..3548.364 rows=1 loops=1) > Execution time: 3548.463 ms > > > At least in this example, the actual runtimes are basically identical > regardless, so there is no great point in sweating over it. > Since The run times are equal, but one is estimated to be 30% more expensive, I think there is at least some little reason to sweat over it. Incidentally, I accidentally ran this against a server running with your patch from https://www.postgresql.org/message-id/10078.1471955...@sss.pgh.pa.us. On that server, it did choose the semi-join. But I have no idea why, as it seems like the effect of that patch would have been to change the distinct estimate from the magic hard-coded 200, to the natural 200 coming from the query itself. Why would that affect the cost? Cheers, Jeff