Re: Parallel CREATE INDEX for GIN indexes

2024-05-13 Thread Tomas Vondra
On 5/13/24 10:19, Andy Fan wrote: > > Tomas Vondra writes: > >> ... >> >> I don't understand the question. The blocks are distributed to workers >> by the parallel table scan, and it certainly does not do that block by >> block. But even it it did, that's not a problem for this code. > > OK, I

Re: Parallel CREATE INDEX for GIN indexes

2024-05-13 Thread Andy Fan
Tomas Vondra writes: >>> 7) v20240502-0007-Detect-wrap-around-in-parallel-callback.patch >>> >>> There's one more efficiency problem - the parallel scans are required to >>> be synchronized, i.e. the scan may start half-way through the table, and >>> then wrap around. Which however means the

Re: Parallel CREATE INDEX for GIN indexes

2024-05-10 Thread Tomas Vondra
On 5/10/24 07:53, Andy Fan wrote: > > Tomas Vondra writes: > >>> I guess both of you are talking about worker process, if here are >>> something in my mind: >>> >>> *btbuild* also let the WORKER dump the tuples into Sharedsort struct >>> and let the LEADER merge them directly. I think this aim

Re: Parallel CREATE INDEX for GIN indexes

2024-05-10 Thread Andy Fan
Tomas Vondra writes: >> I guess both of you are talking about worker process, if here are >> something in my mind: >> >> *btbuild* also let the WORKER dump the tuples into Sharedsort struct >> and let the LEADER merge them directly. I think this aim of this design >> is it is potential to

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Tomas Vondra
On 5/9/24 17:51, Matthias van de Meent wrote: > On Thu, 9 May 2024 at 15:13, Tomas Vondra > wrote: >> Let me explain the relevant part of the patch, and how I understand the >> improvement suggested by Matthias. The patch does the work in three phases: >> >> >> 1) Worker gets data from table,

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Matthias van de Meent
On Thu, 9 May 2024 at 15:13, Tomas Vondra wrote: > Let me explain the relevant part of the patch, and how I understand the > improvement suggested by Matthias. The patch does the work in three phases: > > > 1) Worker gets data from table, split that into index items and add > those into a

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Tomas Vondra
On 5/2/24 20:22, Tomas Vondra wrote: >> >>> For some of the opclasses it can regress (like the jsonb_path_ops). I >>> don't think that's a major issue. Or more precisely, I'm not surprised >>> by it. It'd be nice to be able to disable the parallel builds in these >>> cases somehow, but I haven't

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Tomas Vondra
On 5/9/24 11:44, Andy Fan wrote: > > Hello Tomas, > 2) v20240502-0002-Use-mergesort-in-the-leader-process.patch The approach implemented by 0001 works, but there's a little bit of issue - if there are many distinct keys (e.g. for trigrams that can happen very easily),

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Tomas Vondra
On 5/9/24 12:14, Andy Fan wrote: > > Tomas Vondra writes: > >> 3) v20240502-0003-Remove-the-explicit-pg_qsort-in-workers.patch >> >> In 0002 the workers still do an explicit qsort() on the TID list before >> writing the data into the shared tuplesort. But we can do better - the >> workers can

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Andy Fan
Tomas Vondra writes: > 3) v20240502-0003-Remove-the-explicit-pg_qsort-in-workers.patch > > In 0002 the workers still do an explicit qsort() on the TID list before > writing the data into the shared tuplesort. But we can do better - the > workers can do a merge sort too. To help with this, we

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Andy Fan
Hello Tomas, >>> 2) v20240502-0002-Use-mergesort-in-the-leader-process.patch >>> >>> The approach implemented by 0001 works, but there's a little bit of >>> issue - if there are many distinct keys (e.g. for trigrams that can >>> happen very easily), the workers will hit the memory limit with

Re: Parallel CREATE INDEX for GIN indexes

2024-05-02 Thread Tomas Vondra
On 5/2/24 19:12, Matthias van de Meent wrote: > On Thu, 2 May 2024 at 17:19, Tomas Vondra > wrote: >> >> Hi, >> >> In PG17 we shall have parallel CREATE INDEX for BRIN indexes, and back >> when working on that I was thinking how difficult would it be to do >> something similar to do that for

Re: Parallel CREATE INDEX for GIN indexes

2024-05-02 Thread Matthias van de Meent
On Thu, 2 May 2024 at 17:19, Tomas Vondra wrote: > > Hi, > > In PG17 we shall have parallel CREATE INDEX for BRIN indexes, and back > when working on that I was thinking how difficult would it be to do > something similar to do that for other index types, like GIN. I even had > that on my list of