Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

Jeff Janes Mon, 31 Jul 2017 10:53:59 -0700

On Mon, Jul 24, 2017 at 3:46 PM, Tom Lane <t...@sss.pgh.pa.us> wr
>
>
> regression=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id
> IN
> :values_clause;
>                                                             QUERY PLAN
> ------------------------------------------------------------
> -----------------------------------------------------------------------
>  Aggregate  (cost=245006.16..245006.17 rows=1 width=8) (actual
> time=3550.581..3550.581 rows=1 loops=1)
>  Execution time: 3550.700 ms
>
>



>
> regression=# set enable_hashagg TO 0;
> regression=# set enable_sort TO 0;
> SET
> regression=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id
> IN
> :values_clause;
>                                                           QUERY PLAN
> ------------------------------------------------------------
> -------------------------------------------------------------------
>  Aggregate  (cost=320003.90..320003.91 rows=1 width=8) (actual
> time=3548.364..3548.364 rows=1 loops=1)
>  Execution time: 3548.463 ms
>
>


> At least in this example, the actual runtimes are basically identical
> regardless, so there is no great point in sweating over it.
>


Since The run times are equal, but one is estimated to be 30% more
expensive, I think there is at least some little reason to sweat over it.

Incidentally, I accidentally ran this against a server running with your
patch from
https://www.postgresql.org/message-id/10078.1471955...@sss.pgh.pa.us.  On
that server, it did choose the semi-join.  But I have no idea why, as it
seems like the effect of that patch would have been to change the distinct
estimate from the magic hard-coded 200, to the natural 200 coming from the
query itself.  Why would that affect the cost?

Cheers,

Jeff

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

Reply via email to