Re: POC: GROUP BY optimization

2024-05-27 Thread Alexander Korotkov
Hi! On Tue, Apr 23, 2024 at 1:19 AM Alexander Korotkov wrote: > On Thu, Apr 18, 2024 at 1:57 PM Alexander Korotkov > wrote: > > Thank you for the fixes you've proposed. I didn't look much into > > details yet, but I think the main concern Tom expressed in [1] is > > whether the feature is

Re: POC: GROUP BY optimization

2024-05-23 Thread jian he
On Mon, May 20, 2024 at 4:54 PM jian he wrote: > > > The behavior is still the same as the master. > meaning that below quoted queries are still using "Presorted Key: x". > > > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w; > > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP

Re: POC: GROUP BY optimization

2024-05-21 Thread Andrei Lepikhov
On 20/5/2024 15:54, jian he wrote: As mentioned previously, both A and B can use Incremental Sort, set_cheapest will choose the A instead of B, because A step generated path **first** satisfies increment sort. Yeah, I agree with your analysis. Looking into the cost_incremental_sort, I see that

Re: POC: GROUP BY optimization

2024-05-20 Thread jian he
On Thu, May 16, 2024 at 3:47 PM Andrei Lepikhov wrote: > > On 24.04.2024 13:25, jian he wrote: > > hi. > > I found an interesting case. > > > > CREATE TABLE t1 AS > >SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS > > z, i::int4 AS w > >FROM generate_series(1, 100) AS

Re: POC: GROUP BY optimization

2024-05-16 Thread Andrei Lepikhov
On 24.04.2024 13:25, jian he wrote: hi. I found an interesting case. CREATE TABLE t1 AS SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS z, i::int4 AS w FROM generate_series(1, 100) AS i; CREATE INDEX t1_x_y_idx ON t1 (x, y); ANALYZE t1; SET enable_hashagg = off; SET

Re: POC: GROUP BY optimization

2024-04-28 Thread jian he
On Wed, Apr 24, 2024 at 2:25 PM jian he wrote: > > hi. > I found an interesting case. > > CREATE TABLE t1 AS > SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS > z, i::int4 AS w > FROM generate_series(1, 100) AS i; > CREATE INDEX t1_x_y_idx ON t1 (x, y); > ANALYZE t1; >

Re: POC: GROUP BY optimization

2024-04-24 Thread jian he
hi one more question (maybe a dumb one) drop table if exists t1; CREATE TABLE t1 AS SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS z, i::int4 AS w FROM generate_series(1, 100) AS i; CREATE INDEX t1_x_y_idx ON t1 (x, y); ANALYZE t1; SET enable_hashagg = off; SET

Re: POC: GROUP BY optimization

2024-04-24 Thread jian he
hi. I found an interesting case. CREATE TABLE t1 AS SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS z, i::int4 AS w FROM generate_series(1, 100) AS i; CREATE INDEX t1_x_y_idx ON t1 (x, y); ANALYZE t1; SET enable_hashagg = off; SET enable_seqscan = off; EXPLAIN (COSTS

Re: POC: GROUP BY optimization

2024-04-22 Thread Alexander Korotkov
Hi! On Thu, Apr 18, 2024 at 1:57 PM Alexander Korotkov wrote: > Thank you for the fixes you've proposed. I didn't look much into > details yet, but I think the main concern Tom expressed in [1] is > whether the feature is reasonable at all. I think at this stage the > most important thing is

Re: POC: GROUP BY optimization

2024-04-22 Thread jian he
On Fri, Apr 19, 2024 at 6:44 PM jian he wrote: > > On Thu, Apr 18, 2024 at 6:58 PM Alexander Korotkov > wrote: > > > > Thank you for the fixes you've proposed. I didn't look much into > > details yet, but I think the main concern Tom expressed in [1] is > > whether the feature is reasonable at

Re: POC: GROUP BY optimization

2024-04-21 Thread Andrei Lepikhov
On 4/12/24 06:44, Tom Lane wrote: If this patch were producing better results I'd be more excited about putting more work into it. But on the basis of what I'm seeing right now, I think maybe we ought to give up on it. Let me show current cases where users have a profit with this tiny

Re: POC: GROUP BY optimization

2024-04-19 Thread jian he
On Thu, Apr 18, 2024 at 6:58 PM Alexander Korotkov wrote: > > Thank you for the fixes you've proposed. I didn't look much into > details yet, but I think the main concern Tom expressed in [1] is > whether the feature is reasonable at all. I think at this stage the > most important thing is to

Re: POC: GROUP BY optimization

2024-04-18 Thread Alexander Korotkov
Hi, Andrei! On Thu, Apr 18, 2024 at 11:54 AM Andrei Lepikhov wrote: > On 4/12/24 06:44, Tom Lane wrote: > > * Speaking of pathnodes.h, PathKeyInfo is a pretty uninformative node > > type name, and the comments provided for it are not going to educate > > anybody. What is the "association"

Re: POC: GROUP BY optimization

2024-04-18 Thread Andrei Lepikhov
On 4/12/24 06:44, Tom Lane wrote: * Speaking of pathnodes.h, PathKeyInfo is a pretty uninformative node type name, and the comments provided for it are not going to educate anybody. What is the "association" between the pathkeys and clauses? I guess the clauses are supposed to be

Re: POC: GROUP BY optimization

2024-04-17 Thread Andrei Lepikhov
On 4/12/24 06:44, Tom Lane wrote: * It seems like root->processed_groupClause no longer has much to do with the grouping behavior, which is scary given how much code still believes that it does. I suspect there are bugs lurking there, and 1. Analysing the code, processed_groupClause is used

Re: POC: GROUP BY optimization

2024-04-16 Thread Andrei Lepikhov
On 4/12/24 06:44, Tom Lane wrote: * I'm pretty unconvinced by group_keys_reorder_by_pathkeys (which I notice has already had one band-aid added to it since commit). In particular, it seems to believe that the pathkeys and clauses lists match one-for-one, but I seriously doubt that that invariant

Re: POC: GROUP BY optimization

2024-04-16 Thread Andrei Lepikhov
On 4/12/24 06:44, Tom Lane wrote: * The very first hunk causes get_eclass_for_sort_expr to have side-effects on existing EquivalenceClass data structures. What is the argument that that's not going to cause problems? At the very least it introduces asymmetry, in that the first caller wins the

Re: POC: GROUP BY optimization

2024-04-11 Thread Andrei Lepikhov
On 4/12/24 06:44, Tom Lane wrote: If this patch were producing better results I'd be more excited about putting more work into it. But on the basis of what I'm seeing right now, I think maybe we ought to give up on it. First, thanks for the deep review - sometimes, only a commit gives us a

Re: POC: GROUP BY optimization

2024-04-11 Thread Tom Lane
I'm late to the party, and I apologize for not having paid attention to this thread for months ... but I am wondering why 0452b461b got committed. It seems to add a substantial amount of complexity and planner cycles in return for not much. The reason why I'm dubious about it can be found in

Re: POC: GROUP BY optimization

2024-02-21 Thread Andrei Lepikhov
On 22/2/2024 13:35, Richard Guo wrote: The avg() function on integer argument is commonly used in aggregates.sql.  I don't think this is an issue.  See the first test query in aggregates.sql. Make sense > it should be parallel to the test cases for utilize the ordering of > index

Re: POC: GROUP BY optimization

2024-02-21 Thread Richard Guo
On Thu, Feb 22, 2024 at 12:18 PM Andrei Lepikhov wrote: > On 22/2/2024 09:09, Richard Guo wrote: > > I looked through the v2 patch and have two comments. > > > > * The test case under "Check we don't pick aggregate path key instead of > > grouping path key" does not have EXPLAIN to show the

Re: POC: GROUP BY optimization

2024-02-21 Thread Andrei Lepikhov
On 22/2/2024 09:09, Richard Guo wrote: On Wed, Feb 21, 2024 at 6:20 PM Alexander Korotkov > wrote: Hi, Richard! > What do you think about the revisions for the test cases? I've rebased your patch upthread.  Did some minor beautifications. > *

Re: POC: GROUP BY optimization

2024-02-21 Thread Richard Guo
On Wed, Feb 21, 2024 at 6:20 PM Alexander Korotkov wrote: > Hi, Richard! > > > What do you think about the revisions for the test cases? > > I've rebased your patch upthread. Did some minor beautifications. > > > * The table 'btg' is inserted with 1 tuples, which seems a bit > > expensive

Re: POC: GROUP BY optimization

2024-02-21 Thread Maxim Orlov
Hi! Another issue on test introduced in 0452b461bc405. I think it may be unstable in some circumstances. For example, if we'll try to use different BLCKSZ. See, I've made a little change in the number of tuples to be inserted: $ git diff diff --git a/src/test/regress/sql/aggregates.sql

Re: POC: GROUP BY optimization

2024-02-21 Thread Alexander Korotkov
Hi, Richard! > What do you think about the revisions for the test cases? I've rebased your patch upthread. Did some minor beautifications. > * The table 'btg' is inserted with 1 tuples, which seems a bit > expensive for a test. I don't think we need such a big table to test > what we

Re: POC: GROUP BY optimization

2024-02-21 Thread Richard Guo
On Fri, Feb 2, 2024 at 12:40 PM Andrei Lepikhov wrote: > On 2/2/2024 11:06, Richard Guo wrote: > > > > On Fri, Feb 2, 2024 at 11:32 AM Richard Guo > > wrote: > > > > On Fri, Feb 2, 2024 at 10:02 AM Tom Lane > > wrote: > > > >

Re: POC: GROUP BY optimization

2024-02-01 Thread Andrei Lepikhov
On 2/2/2024 11:06, Richard Guo wrote: On Fri, Feb 2, 2024 at 11:32 AM Richard Guo > wrote: On Fri, Feb 2, 2024 at 10:02 AM Tom Lane mailto:t...@sss.pgh.pa.us>> wrote: One of the test cases added by this commit has not been very stable in the

Re: POC: GROUP BY optimization

2024-02-01 Thread Richard Guo
On Fri, Feb 2, 2024 at 11:32 AM Richard Guo wrote: > On Fri, Feb 2, 2024 at 10:02 AM Tom Lane wrote: > >> One of the test cases added by this commit has not been very >> stable in the buildfarm. Latest example is here: >> >> >>

Re: POC: GROUP BY optimization

2024-02-01 Thread Andrei Lepikhov
On 2/2/2024 09:02, Tom Lane wrote: Alexander Korotkov writes: I'm going to push this if there are no objections. One of the test cases added by this commit has not been very stable in the buildfarm. Latest example is here:

Re: POC: GROUP BY optimization

2024-02-01 Thread Richard Guo
On Fri, Feb 2, 2024 at 10:02 AM Tom Lane wrote: > Alexander Korotkov writes: > > I'm going to push this if there are no objections. > > One of the test cases added by this commit has not been very > stable in the buildfarm. Latest example is here: > > >

Re: POC: GROUP BY optimization

2024-02-01 Thread Tom Lane
Alexander Korotkov writes: > I'm going to push this if there are no objections. One of the test cases added by this commit has not been very stable in the buildfarm. Latest example is here: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04 and I've seen

Re: POC: GROUP BY optimization

2024-01-26 Thread Robert Haas
On Fri, Jan 26, 2024 at 10:38 AM Tom Lane wrote: > Sadly, that's not a small task: > > * We'd need to put effort into assigning more realistic procost > values --- preferably across the board, not just comparison functions. > As long as all the comparison functions have procost 1.0, you're > just

Re: POC: GROUP BY optimization

2024-01-26 Thread Tom Lane
Robert Haas writes: > On Tue, Dec 26, 2023 at 10:23 PM Tom Lane wrote: >> I think it's a fool's errand to even try to separate different sort >> column orderings by cost. We simply do not have sufficiently accurate >> cost information. The previous patch in this thread got reverted because >>

Re: POC: GROUP BY optimization

2024-01-26 Thread Robert Haas
On Tue, Dec 26, 2023 at 10:23 PM Tom Lane wrote: > I think it's a fool's errand to even try to separate different sort > column orderings by cost. We simply do not have sufficiently accurate > cost information. The previous patch in this thread got reverted because > of that (well, also some

Re: POC: GROUP BY optimization

2024-01-26 Thread vignesh C
On Thu, 25 Jan 2024 at 01:15, Alexander Korotkov wrote: > > On Wed, Jan 24, 2024 at 7:38 PM Nathan Bossart > wrote: > > A recent buildfarm failure [0] seems to indicate a name collision with the > > "abc" index in the aggregates.sql test and the "abc" table in > > namespace.sql. > > > > [0] >

Re: POC: GROUP BY optimization

2024-01-24 Thread Alexander Korotkov
On Wed, Jan 24, 2024 at 7:38 PM Nathan Bossart wrote: > A recent buildfarm failure [0] seems to indicate a name collision with the > "abc" index in the aggregates.sql test and the "abc" table in > namespace.sql. > > [0] >

Re: POC: GROUP BY optimization

2024-01-24 Thread Nathan Bossart
A recent buildfarm failure [0] seems to indicate a name collision with the "abc" index in the aggregates.sql test and the "abc" table in namespace.sql. [0] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet=2024-01-24%2014%3A05%3A14 -- Nathan Bossart Amazon Web Services:

Re: POC: GROUP BY optimization

2024-01-19 Thread Alexander Korotkov
Hi! I've applied your changes with minor editing, thank you. On Thu, Jan 18, 2024 at 11:49 AM Andrei Lepikhov wrote: > >> * Part of the work performed in this patch overlaps with that of > >> preprocess_groupclause. They are both trying to adjust the ordering of > >> the GROUP BY keys to match

Re: POC: GROUP BY optimization

2024-01-18 Thread Andrei Lepikhov
Just forgotten second attachment. -- regards, Andrei Lepikhov Postgres Professional diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c index 1095b73dac..b612420547 100644 --- a/src/backend/optimizer/path/pathkeys.c +++

Re: POC: GROUP BY optimization

2024-01-18 Thread Andrei Lepikhov
On 16/1/2024 22:05, Alexander Korotkov wrote: On Tue, Jan 16, 2024 at 4:48 AM Richard Guo wrote: * When trying to match the ordering of GROUP BY to that of ORDER BY in get_useful_group_keys_orderings, this patch checks against the length of path->pathkeys. This does not make sense. I guess

Re: POC: GROUP BY optimization

2024-01-16 Thread Alexander Korotkov
Hi! Thank you for your review. The revised patchset is attached. On Tue, Jan 16, 2024 at 4:48 AM Richard Guo wrote: > On Mon, Jan 15, 2024 at 3:56 PM Alexander Korotkov > wrote: >> >> On Mon, Jan 15, 2024 at 8:42 AM Richard Guo wrote: >> > On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov

Re: POC: GROUP BY optimization

2024-01-15 Thread Richard Guo
On Mon, Jan 15, 2024 at 3:56 PM Alexander Korotkov wrote: > On Mon, Jan 15, 2024 at 8:42 AM Richard Guo > wrote: > > On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov > wrote: > >> > >> Thank you for providing the test case relevant for this code change. > >> The revised patch incorporating

Re: POC: GROUP BY optimization

2024-01-15 Thread Alena Rybakina
On 15.01.2024 12:46, Andrei Lepikhov wrote: On 15/1/2024 13:42, Richard Guo wrote: On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov mailto:aekorot...@gmail.com>> wrote:     Thank you for providing the test case relevant for this code change.     The revised patch incorporating this change

Re: POC: GROUP BY optimization

2024-01-15 Thread Andrei Lepikhov
On 15/1/2024 13:42, Richard Guo wrote: On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov > wrote: Thank you for providing the test case relevant for this code change. The revised patch incorporating this change is attached.  Now the patchset looks

Re: POC: GROUP BY optimization

2024-01-14 Thread Alexander Korotkov
On Mon, Jan 15, 2024 at 8:42 AM Richard Guo wrote: > On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov > wrote: >> >> Thank you for providing the test case relevant for this code change. >> The revised patch incorporating this change is attached. Now the >> patchset looks good to me. I'm

Re: POC: GROUP BY optimization

2024-01-14 Thread Richard Guo
On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov wrote: > Thank you for providing the test case relevant for this code change. > The revised patch incorporating this change is attached. Now the > patchset looks good to me. I'm going to push it if there are no > objections. Seems I'm late

Re: POC: GROUP BY optimization

2024-01-14 Thread Alexander Korotkov
On Sun, Jan 14, 2024 at 2:14 PM Andrei Lepikhov wrote: > On 13/1/2024 22:00, Alexander Korotkov wrote: > > On Sat, Jan 13, 2024 at 11:09 AM Andrei Lepikhov > > wrote: > >> On 11/1/2024 18:30, Alexander Korotkov wrote: > >>> On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov > >>> wrote: > > Hmm,

Re: POC: GROUP BY optimization

2024-01-14 Thread Andrei Lepikhov
On 13/1/2024 22:00, Alexander Korotkov wrote: On Sat, Jan 13, 2024 at 11:09 AM Andrei Lepikhov wrote: On 11/1/2024 18:30, Alexander Korotkov wrote: On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov wrote: Hmm, I don't see this old code in these patches. Resend 0002-* because of trailing spaces.

Re: POC: GROUP BY optimization

2024-01-13 Thread Alexander Korotkov
On Sat, Jan 13, 2024 at 11:09 AM Andrei Lepikhov wrote: > On 11/1/2024 18:30, Alexander Korotkov wrote: > > On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov wrote: > >>> Hmm, I don't see this old code in these patches. Resend 0002-* because > >>> of trailing spaces. > >> > >> > >> AFAIK, cfbot does

Re: POC: GROUP BY optimization

2024-01-13 Thread Andrei Lepikhov
On 11/1/2024 18:30, Alexander Korotkov wrote: Hi! On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov wrote: Hmm, I don't see this old code in these patches. Resend 0002-* because of trailing spaces. AFAIK, cfbot does not seek old versions of patchset parts in previous messages. So for it to run

Re: POC: GROUP BY optimization

2024-01-11 Thread Alexander Korotkov
Hi! On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov wrote: >> Hmm, I don't see this old code in these patches. Resend 0002-* because >> of trailing spaces. > > > AFAIK, cfbot does not seek old versions of patchset parts in previous > messages. So for it to run correctly, a new version of the whole

Re: POC: GROUP BY optimization

2024-01-09 Thread Pavel Borisov
Hi, Andrei! > Hmm, I don't see this old code in these patches. Resend 0002-* because > of trailing spaces. > AFAIK, cfbot does not seek old versions of patchset parts in previous messages. So for it to run correctly, a new version of the whole patchset should be sent even if most patches are

Re: POC: GROUP BY optimization

2024-01-09 Thread Andrei Lepikhov
On 9/1/2024 16:45, vignesh C wrote: On Tue, 9 Jan 2024 at 14:31, Andrei Lepikhov wrote: Here is a new version of GROUP-BY optimization without sort model. On 21/12/2023 17:53, Alexander Korotkov wrote: I'd like to make some notes. 1) As already mentioned, there is clearly a repetitive

Re: POC: GROUP BY optimization

2024-01-09 Thread vignesh C
On Tue, 9 Jan 2024 at 14:31, Andrei Lepikhov wrote: > > Here is a new version of GROUP-BY optimization without sort model. > > On 21/12/2023 17:53, Alexander Korotkov wrote: > > I'd like to make some notes. > > > > 1) As already mentioned, there is clearly a repetitive pattern for the > > code

Re: POC: GROUP BY optimization

2024-01-09 Thread Andrei Lepikhov
Here is a new version of GROUP-BY optimization without sort model. On 21/12/2023 17:53, Alexander Korotkov wrote: I'd like to make some notes. 1) As already mentioned, there is clearly a repetitive pattern for the code following after get_useful_group_keys_orderings() calls. I think it would

Re: POC: GROUP BY optimization

2023-12-28 Thread Andrei Lepikhov
On 28/12/2023 18:29, Alexander Korotkov wrote: On Thu, Dec 28, 2023 at 10:22 AM Andrei Lepikhov wrote: But arrangement with an ORDER BY clause doesn't work: DROP INDEX abc; explain SELECT x,w,z FROM t GROUP BY (w,x,z) ORDER BY (x,z,w); I think the reason is that the sort_pathkeys and

Re: POC: GROUP BY optimization

2023-12-28 Thread Alexander Korotkov
On Thu, Dec 28, 2023 at 10:22 AM Andrei Lepikhov wrote: > But arrangement with an ORDER BY clause doesn't work: > > DROP INDEX abc; > explain SELECT x,w,z FROM t GROUP BY (w,x,z) ORDER BY (x,z,w); > > I think the reason is that the sort_pathkeys and group_pathkeys are > physically different

Re: POC: GROUP BY optimization

2023-12-28 Thread Andrei Lepikhov
On 27/12/2023 12:07, Tom Lane wrote: Andrei Lepikhov writes: To be clear. In [1], I mentioned we can perform micro-benchmarks and structure costs of operators. At least for fixed-length operators, it is relatively easy. I repeat what I said: this is a fool's errand. You will not get

Re: POC: GROUP BY optimization

2023-12-26 Thread Tom Lane
Andrei Lepikhov writes: > To be clear. In [1], I mentioned we can perform micro-benchmarks and > structure costs of operators. At least for fixed-length operators, it is > relatively easy. I repeat what I said: this is a fool's errand. You will not get trustworthy results even for the cases

Re: POC: GROUP BY optimization

2023-12-26 Thread Andrei Lepikhov
On 27/12/2023 11:15, Alexander Korotkov wrote: On Wed, Dec 27, 2023 at 5:23 AM Tom Lane wrote: Alexander Korotkov writes: 2) An accurate estimate of the sorting cost is quite a difficult task. Indeed. What if we make a simple rule of thumb that sorting integers and floats is cheaper than

Re: POC: GROUP BY optimization

2023-12-26 Thread Tom Lane
Alexander Korotkov writes: > On Wed, Dec 27, 2023 at 5:23 AM Tom Lane wrote: >> I think it's a fool's errand to even try to separate different sort >> column orderings by cost. > Besides sorting column orderings by cost, this patch also tries to > match GROUP BY pathkeys to input pathkeys and

Re: POC: GROUP BY optimization

2023-12-26 Thread Alexander Korotkov
On Wed, Dec 27, 2023 at 5:23 AM Tom Lane wrote: > Alexander Korotkov writes: > > 2) An accurate estimate of the sorting cost is quite a difficult task. > > Indeed. > > > What if we make a simple rule of thumb that sorting integers and > > floats is cheaper than sorting numerics and strings with

Re: POC: GROUP BY optimization

2023-12-26 Thread Tom Lane
Alexander Korotkov writes: > 2) An accurate estimate of the sorting cost is quite a difficult task. Indeed. > What if we make a simple rule of thumb that sorting integers and > floats is cheaper than sorting numerics and strings with collation C, > in turn, that is cheaper than sorting

Re: POC: GROUP BY optimization

2023-12-26 Thread Alexander Korotkov
On Tue, Dec 26, 2023 at 1:37 PM Andrei Lepikhov wrote: > On 21/12/2023 17:53, Alexander Korotkov wrote: > > On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov > > wrote: > >> New version of the patch. Fixed minor inconsistencies and rebased onto > >> current master. > > Thank you (and other

Re: POC: GROUP BY optimization

2023-12-26 Thread Andrei Lepikhov
On 21/12/2023 17:53, Alexander Korotkov wrote: On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov wrote: New version of the patch. Fixed minor inconsistencies and rebased onto current master. Thank you (and other authors) for working on this subject. Indeed to GROUP BY clauses are

Re: POC: GROUP BY optimization

2023-12-21 Thread Alexander Korotkov
Hi! On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov wrote: > > New version of the patch. Fixed minor inconsistencies and rebased onto > current master. Thank you (and other authors) for working on this subject. Indeed to GROUP BY clauses are order-agnostic. Reordering them in the most

Re: POC: GROUP BY optimization

2023-10-01 Thread Andrei Lepikhov
New version of the patch. Fixed minor inconsistencies and rebased onto current master. -- regards, Andrey Lepikhov Postgres Professional From 2f5a42c8a53286f830e3376ff4d3f8b7d4215b4b Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Wed, 13 Sep 2023 11:20:03 +0700 Subject: [PATCH]

Re: POC: GROUP BY optimization

2023-09-25 Thread Andrey Lepikhov
On 20/7/2023 18:46, Tomas Vondra wrote: 2) estimating quicksort comparisons - This relies on ndistinct estimates, and I'm not sure how much more reliable we can make those. Probably not much :-( Not sure what to do about this, the only thing I can think of is to track "reliability" of the

Re: POC: GROUP BY optimization

2023-09-18 Thread Andrey Lepikhov
On 20/7/2023 18:46, Tomas Vondra wrote: On 7/20/23 08:37, Andrey Lepikhov wrote: On 3/10/2022 21:56, Tom Lane wrote: Revert "Optimize order of GROUP BY keys". This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and several follow-on fixes. ... Since we're hard up against the release

Re: POC: GROUP BY optimization

2023-09-12 Thread Andrey Lepikhov
Hi, Here is the patch rebased on the current master. Also, I fixed some minor slips and one static analyzer warning. This is just for adding to the next commitfest and enforcing work with this patch. One extra difference in newly added postgres_fdw tests is caused by this patch - see changes

Re: POC: GROUP BY optimization

2023-07-24 Thread Tomas Vondra
On 7/24/23 14:04, Andrey Lepikhov wrote: > On 24/7/2023 16:56, Tomas Vondra wrote: >> On 7/24/23 04:10, Andrey Lepikhov wrote: >>> On 20/7/2023 18:46, Tomas Vondra wrote: On 7/20/23 08:37, Andrey Lepikhov wrote: > On 3/10/2022 21:56, Tom Lane wrote: >> Revert "Optimize order of

Re: POC: GROUP BY optimization

2023-07-24 Thread Andrey Lepikhov
On 24/7/2023 16:56, Tomas Vondra wrote: On 7/24/23 04:10, Andrey Lepikhov wrote: On 20/7/2023 18:46, Tomas Vondra wrote: On 7/20/23 08:37, Andrey Lepikhov wrote: On 3/10/2022 21:56, Tom Lane wrote: Revert "Optimize order of GROUP BY keys". This reverts commit

Re: POC: GROUP BY optimization

2023-07-24 Thread Tomas Vondra
On 7/24/23 04:10, Andrey Lepikhov wrote: > On 20/7/2023 18:46, Tomas Vondra wrote: >> On 7/20/23 08:37, Andrey Lepikhov wrote: >>> On 3/10/2022 21:56, Tom Lane wrote: Revert "Optimize order of GROUP BY keys". This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and

Re: POC: GROUP BY optimization

2023-07-23 Thread Andrey Lepikhov
On 20/7/2023 18:46, Tomas Vondra wrote: On 7/20/23 08:37, Andrey Lepikhov wrote: On 3/10/2022 21:56, Tom Lane wrote: Revert "Optimize order of GROUP BY keys". This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and several follow-on fixes. ... Since we're hard up against the release

Re: POC: GROUP BY optimization

2023-07-20 Thread Andrey Lepikhov
On 20/7/2023 18:46, Tomas Vondra wrote: On 7/20/23 08:37, Andrey Lepikhov wrote: On 3/10/2022 21:56, Tom Lane wrote: Revert "Optimize order of GROUP BY keys". This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and several follow-on fixes. ... Since we're hard up against the release

Re: POC: GROUP BY optimization

2023-07-20 Thread Tomas Vondra
On 7/20/23 08:37, Andrey Lepikhov wrote: > On 3/10/2022 21:56, Tom Lane wrote: >> Revert "Optimize order of GROUP BY keys". >> >> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and >> several follow-on fixes. >> ... >> Since we're hard up against the release deadline for v15, let's

Re: POC: GROUP BY optimization

2023-07-20 Thread Andrey Lepikhov
On 3/10/2022 21:56, Tom Lane wrote: > Revert "Optimize order of GROUP BY keys". > > This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and > several follow-on fixes. > ... > Since we're hard up against the release deadline for v15, let's > revert these changes for now. We can always

Re: POC: GROUP BY optimization

2022-09-19 Thread David Rowley
On Wed, 13 Jul 2022 at 15:37, David Rowley wrote: > I'm just in this general area of the code again today and wondered > about the header comment for the preprocess_groupclause() function. > > It says: > > * In principle it might be interesting to consider other orderings of the > * GROUP BY

Re: POC: GROUP BY optimization

2022-09-05 Thread Michael Paquier
On Mon, Sep 05, 2022 at 12:14:48AM +0200, Tomas Vondra wrote: > I've pushed the fix to 15+master. In the end I just used David's patches > that set parallel_setup_cost to 0. Thanks, Tomas! -- Michael signature.asc Description: PGP signature

Re: POC: GROUP BY optimization

2022-09-04 Thread Jonathan S. Katz
On 9/4/22 6:14 PM, Tomas Vondra wrote: I've pushed the fix to 15+master. In the end I just used David's patches that set parallel_setup_cost to 0. Thanks! I have closed the open item. Jonathan OpenPGP_signature Description: OpenPGP digital signature

Re: POC: GROUP BY optimization

2022-09-04 Thread Tomas Vondra
On 9/1/22 16:05, Jonathan S. Katz wrote: > On 9/1/22 9:06 AM, Tomas Vondra wrote: >> >> >> On 8/30/22 15:15, Jonathan S. Katz wrote: >>> On 8/18/22 3:29 AM, Tomas Vondra wrote: On 8/18/22 03:32, David Rowley wrote: >>> > Here are a couple of patches to demo the idea. >

Re: POC: GROUP BY optimization

2022-09-01 Thread Jonathan S. Katz
On 9/1/22 9:06 AM, Tomas Vondra wrote: On 8/30/22 15:15, Jonathan S. Katz wrote: On 8/18/22 3:29 AM, Tomas Vondra wrote: On 8/18/22 03:32, David Rowley wrote: Here are a couple of patches to demo the idea. Yeah, that's an option too. I should have mentioned it along with the

Re: POC: GROUP BY optimization

2022-09-01 Thread Tomas Vondra
On 8/30/22 15:15, Jonathan S. Katz wrote: > On 8/18/22 3:29 AM, Tomas Vondra wrote: >> >> >> On 8/18/22 03:32, David Rowley wrote: > >>> Here are a couple of patches to demo the idea. >>> >> >> Yeah, that's an option too. I should have mentioned it along with the >> cpu_operator_cost. >> >>

Re: POC: GROUP BY optimization

2022-08-30 Thread Jonathan S. Katz
On 8/18/22 3:29 AM, Tomas Vondra wrote: On 8/18/22 03:32, David Rowley wrote: Here are a couple of patches to demo the idea. Yeah, that's an option too. I should have mentioned it along with the cpu_operator_cost. BTW would you mind taking a look at the costing? I think it's fine, but

Re: POC: GROUP BY optimization

2022-08-18 Thread Tomas Vondra
On 8/18/22 03:32, David Rowley wrote: > On Thu, 18 Aug 2022 at 02:46, Tomas Vondra > wrote: >> So I don't think the current costing is wrong, but it certainly is more >> complex. But the test does not test what it intended - I have two ideas >> how to make it work: >> >> 1) increase the number

Re: POC: GROUP BY optimization

2022-08-17 Thread David Rowley
On Thu, 18 Aug 2022 at 02:46, Tomas Vondra wrote: > So I don't think the current costing is wrong, but it certainly is more > complex. But the test does not test what it intended - I have two ideas > how to make it work: > > 1) increase the number of rows in the table > > 2) increase

Re: POC: GROUP BY optimization

2022-08-17 Thread Tomas Vondra
On 8/2/22 13:14, David Rowley wrote: > On Tue, 2 Aug 2022 at 22:21, Michael Paquier wrote: >> >> On Fri, Jul 15, 2022 at 09:46:51PM +0200, Tomas Vondra wrote: >>> I agree this is a mistake in db0d67db2 that makes the test useless. >> >> Tomas, please note that this is an open item assigned to

Re: POC: GROUP BY optimization

2022-08-02 Thread David Rowley
On Tue, 2 Aug 2022 at 22:21, Michael Paquier wrote: > > On Fri, Jul 15, 2022 at 09:46:51PM +0200, Tomas Vondra wrote: > > I agree this is a mistake in db0d67db2 that makes the test useless. > > Tomas, please note that this is an open item assigned to you. Are you > planning to do something with

Re: POC: GROUP BY optimization

2022-08-02 Thread Michael Paquier
On Fri, Jul 15, 2022 at 09:46:51PM +0200, Tomas Vondra wrote: > I agree this is a mistake in db0d67db2 that makes the test useless. Tomas, please note that this is an open item assigned to you. Are you planning to do something with these regression tests by beta3? -- Michael signature.asc

Re: POC: GROUP BY optimization

2022-07-15 Thread Tomas Vondra
On 7/15/22 07:18, David Rowley wrote: > On Thu, 31 Mar 2022 at 12:19, Tomas Vondra > wrote: >> Pushed, after going through the patch once more, running check-world >> under valgrind, and updating the commit message. > > I'm still working in this area and I noticed that db0d67db2 updated >

Re: POC: GROUP BY optimization

2022-07-14 Thread David Rowley
On Thu, 31 Mar 2022 at 12:19, Tomas Vondra wrote: > Pushed, after going through the patch once more, running check-world > under valgrind, and updating the commit message. I'm still working in this area and I noticed that db0d67db2 updated some regression tests in partition_aggregate.out without

Re: POC: GROUP BY optimization

2022-07-12 Thread David Rowley
On Thu, 31 Mar 2022 at 12:19, Tomas Vondra wrote: > Pushed, after going through the patch once more, running check-world > under valgrind, and updating the commit message. I'm just in this general area of the code again today and wondered about the header comment for the preprocess_groupclause()

Re: POC: GROUP BY optimization

2022-03-30 Thread Tomas Vondra
Pushed, after going through the patch once more, running check-world under valgrind, and updating the commit message. Thanks to everyone who reviewed/tested this patch! -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company

Re: POC: GROUP BY optimization

2022-03-28 Thread Zhihong Yu
On Mon, Mar 28, 2022 at 5:49 PM Tomas Vondra wrote: > Hi, > > Here's a rebased/improved version of the patch, with smaller parts > addressing various issues. There are seven parts: > > 0001 - main part, just rebased > > 0002 - replace the debug GUC options with a single GUC to disable the >

Re: POC: GROUP BY optimization

2022-03-18 Thread Andrey V. Lepikhov
On 3/15/22 13:26, Tomas Vondra wrote: Thanks for the rebase. The two proposed changes (tweaked costing and simplified fake_var handling) seem fine to me. I think the last thing that needs to be done is cleanup of the debug GUCs, which I added to allow easier experimentation with the patch.

Re: POC: GROUP BY optimization

2022-02-10 Thread Andrey V. Lepikhov
On 1/22/22 01:34, Tomas Vondra wrote: I rebased (with minor fixes) this patch onto current master. Also, second patch dedicated to a problem of "varno 0" (fake_var). I think, this case should make the same estimations as in the case of varno != 0, but no any stats found. So I suppose to

Re: POC: GROUP BY optimization

2022-02-01 Thread Andrey V. Lepikhov
On 7/22/21 3:58 AM, Tomas Vondra wrote: I've simplified the costing a bit, and the attached version actually undoes all the "suspicious" plan changes in postgres_fdw. It changes one new plan, but that seems somewhat reasonable, as it pushes sort to the remote side. I tried to justify heap-sort

Re: POC: GROUP BY optimization

2022-01-23 Thread Andrey Lepikhov
On 22/1/2022 01:34, Tomas Vondra wrote: The other thing we could do is reduce the coefficient gradually - so it'd be 1.5 for the first pathkey, then 1.25 for the next one, and so on. But it seems somewhat arbitrary (I certainly don't have some sound theoretical justification ...). I think,

Re: POC: GROUP BY optimization

2022-01-21 Thread Tomas Vondra
On 1/21/22 12:09, Andrey Lepikhov wrote: On 7/22/21 3:58 AM, Tomas Vondra wrote: 4) I'm not sure it's actually a good idea to pass tuplesPerPrevGroup to estimate_num_groups_incremental. In principle yes, if we use "group size" from the previous step, then the returned value is the number of

Re: POC: GROUP BY optimization

2022-01-21 Thread Andrey Lepikhov
On 7/22/21 3:58 AM, Tomas Vondra wrote: 4) I'm not sure it's actually a good idea to pass tuplesPerPrevGroup to estimate_num_groups_incremental. In principle yes, if we use "group size" from the previous step, then the returned value is the number of new groups after adding the "new" pathkey.

  1   2   >