Re: [GENERAL] Performance with high correlation in group by on PK

Jeff Janes Mon, 28 Aug 2017 12:33:02 -0700

On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys <haram...@gmail.com> wrote:


> Hi all,
>
> It's been a while since I actually got to use PG for anything serious,
> but we're finally doing some experimentation @work now to see if it is
> suitable for our datawarehouse. So far it's been doing well, but there
> is a particular type of query I run into that I expect we will
> frequently use and that's choosing a sequential scan - and I can't
> fathom why.
>
> This is on:
>
>
> The query in question is:
> select "VBAK_MANDT", max("VBAK_VBELN")
>   from staging.etl00001_vbak
>  group by "VBAK_MANDT";
>
> This is the header-table for another detail table, and in this case
> we're already seeing a seqscan. The thing is, there are 15M rows in
> the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT",
> "VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact,
> we only have 1 at the moment!).
>

You need an "index skip-scan" or "loose index scan".  PostgreSQL doesn't
currently detect and implement them automatically, but you can use a
recursive CTE to get it to work.  There are some examples at
https://wiki.postgresql.org/wiki/Loose_indexscan

Cheers,

Jeff

Re: [GENERAL] Performance with high correlation in group by on PK

Reply via email to