Re: Improve eviction algorithm in ReorderBuffer

2024-02-01 Thread Masahiko Sawada
Hi,

On Wed, Jan 31, 2024 at 5:32 PM vignesh C  wrote:
>
> On Tue, 30 Jan 2024 at 13:37, Masahiko Sawada  wrote:
> >
> > On Fri, Jan 26, 2024 at 5:36 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Wed, Dec 20, 2023 at 12:11 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 6:49 AM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > On Tue, Dec 19, 2023 at 8:02 PM Amit Kapila  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Dec 19, 2023 at 8:31 AM Masahiko Sawada 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Sun, Dec 17, 2023 at 11:40 AM Amit Kapila 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > The individual transactions shouldn't cross
> > > > > > > > 'logical_decoding_work_mem'. I got a bit confused by your 
> > > > > > > > proposal to
> > > > > > > > maintain the lists: "...splitting it into two lists: 
> > > > > > > > transactions
> > > > > > > > consuming 5% < and 5% >=  of the memory limit, and checking the 
> > > > > > > > 5% >=
> > > > > > > > list preferably.". In the previous sentence, what did you mean 
> > > > > > > > by
> > > > > > > > transactions consuming 5% >= of the memory limit? I got the 
> > > > > > > > impression
> > > > > > > > that you are saying to maintain them in a separate transaction 
> > > > > > > > list
> > > > > > > > which doesn't seems to be the case.
> > > > > > >
> > > > > > > I wanted to mean that there are three lists in total: the first 
> > > > > > > one
> > > > > > > maintain the transactions consuming more than 10% of
> > > > > > > logical_decoding_work_mem,
> > > > > > >
> > > > > >
> > > > > > How can we have multiple transactions in the list consuming more 
> > > > > > than
> > > > > > 10% of logical_decoding_work_mem? Shouldn't we perform serialization
> > > > > > before any xact reaches logical_decoding_work_mem?
> > > > >
> > > > > Well, suppose logical_decoding_work_mem is set to 64MB, transactions
> > > > > consuming more than 6.4MB are added to the list. So for example, it's
> > > > > possible that the list has three transactions each of which are
> > > > > consuming 10MB while the total memory usage in the reorderbuffer is
> > > > > still 30MB (less than logical_decoding_work_mem).
> > > > >
> > > >
> > > > Thanks for the clarification. I misunderstood the list to have
> > > > transactions greater than 70.4 MB (64 + 6.4) in your example. But one
> > > > thing to note is that maintaining these lists by default can also have
> > > > some overhead unless the list of open transactions crosses a certain
> > > > threshold.
> > > >
> > >
> > > On further analysis, I realized that the approach discussed here might
> > > not be the way to go. The idea of dividing transactions into several
> > > subgroups is to divide a large number of entries into multiple
> > > sub-groups so we can reduce the complexity to search for the
> > > particular entry. Since we assume that there are no big differences in
> > > entries' sizes within a sub-group, we can pick the entry to evict in
> > > O(1). However, what we really need to avoid here is that we end up
> > > increasing the number of times to evict entries because serializing an
> > > entry to the disk is more costly than searching an entry on memory in
> > > general.
> > >
> > > I think that it's no problem in a large-entries subgroup but when it
> > > comes to the smallest-entries subgroup, like for entries consuming
> > > less than 5% of the limit, it could end up evicting many entries. For
> > > example, there would be a huge difference between serializing 1 entry
> > > consuming 5% of the memory limit and serializing 5000 entries
> > > consuming 0.001% of the memory limit. Even if we can select 5000
> > > entries quickly, I think the latter would be slower in total. The more
> > > subgroups we create, the more the algorithm gets complex and the
> > > overheads could cause. So I think we need to search for the largest
> > > entry in order to minimize the number of evictions anyway.
> > >
> > > Looking for data structures and algorithms, I think binaryheap with
> > > some improvements could be promising. I mentioned before why we cannot
> > > use the current binaryheap[1]. The missing pieces are efficient ways
> > > to remove the arbitrary entry and to update the arbitrary entry's key.
> > > The current binaryheap provides binaryheap_remove_node(), which is
> > > O(log n), but it requires the entry's position in the binaryheap. We
> > > can know the entry's position just after binaryheap_add_unordered()
> > > but it might be changed after heapify. Searching the node's position
> > > is O(n). So the improvement idea is to add a hash table to the
> > > binaryheap so that it can track the positions for each entry so that
> > > we can remove the arbitrary entry in O(log n) and also update the
> > > arbitrary entry's key in O(log n). This is known as the indexed
> > > priority queue. I've attached the patch for that (0001 and 0002).
> > >
> > > That way, in terms of 

Re: Commitfest 2024-01 is now closed

2024-02-01 Thread Richard Guo
On Fri, Feb 2, 2024 at 2:41 PM Michael Paquier  wrote:

> On Fri, Feb 02, 2024 at 11:56:36AM +0530, Amit Kapila wrote:
> > On Fri, Feb 2, 2024 at 10:58 AM Tom Lane  wrote:
> >> Thanks for all the work you did running it!  CFM is typically a
> >> thankless exercise in being a nag, but I thought you put in more
> >> than the usual amount of effort to keep things moving.
> >>
> >
> > +1. Thanks a lot for all the hard work.
>
> +1.  Thanks!


+1.  Thanks!

Thanks
Richard


Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Sutou Kouhei
Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 2 Feb 2024 15:27:15 +0800,
  Junwang Zhao  wrote:

> I agree CopyToRoutine should be placed into CopyToStateData, but
> why set it after ProcessCopyOptions, the implementation of
> CopyToGetRoutine doesn't make sense if we want to support custom
> format in the future.
> 
> Seems the refactor of v11 only considered performance but not
> *extendable copy format*.

Right.
We focus on performance for now. And then we will focus on
extendability. [1]

[1] 
https://www.postgresql.org/message-id/flat/20240130.171511.2014195814665030502.kou%40clear-code.com#757a48c273f140081656ec8eb69f502b

> If V7 and V10 have no performance reduction, then I think V6 is also
> good with performance, since most of the time goes to CopyToOneRow
> and CopyFromOneRow.

Don't worry. I'll re-submit changes in the v6 patch set
again after the current patch set that focuses on
performance is merged.

> I just think we should take the *extendable* into consideration at
> the beginning.

Introducing Copy{To,From}Routine is also valuable for
extendability. We can improve extendability later. Let's
focus on only performance for now to introduce
Copy{To,From}Routine.


Thanks,
-- 
kou




Re: An improvement on parallel DISTINCT

2024-02-01 Thread Richard Guo
On Fri, Feb 2, 2024 at 11:26 AM David Rowley  wrote:

> In light of this, do you still think it's worthwhile making this change?
>
> For me, I think all it's going to result in is extra planner work
> without any performance gains.


Hmm, with the query below, I can see that the new plan is cheaper than
the old plan, and the cost difference exceeds STD_FUZZ_FACTOR.

create table t (a int, b int);
insert into t select i%10, i from generate_series(1,1000)i;
analyze t;

-- on master
explain (costs on) select distinct a from t order by a limit 1;
QUERY PLAN
--
 Limit  (cost=120188.50..120188.51 rows=1 width=4)
   ->  Sort  (cost=120188.50..120436.95 rows=99379 width=4)
 Sort Key: a
 ->  HashAggregate  (cost=118697.82..119691.61 rows=99379 width=4)
   Group Key: a
   ->  Gather  (cost=97331.33..118200.92 rows=198758 width=4)
 Workers Planned: 2
 ->  HashAggregate  (cost=96331.33..97325.12 rows=99379
width=4)
   Group Key: a
   ->  Parallel Seq Scan on t  (cost=0.00..85914.67
rows=417 width=4)
(10 rows)

-- on patched
explain (costs on) select distinct a from t order by a limit 1;
QUERY PLAN
--
 Limit  (cost=106573.93..106574.17 rows=1 width=4)
   ->  Unique  (cost=106573.93..130260.88 rows=99379 width=4)
 ->  Gather Merge  (cost=106573.93..129763.98 rows=198758 width=4)
   Workers Planned: 2
   ->  Sort  (cost=105573.91..105822.35 rows=99379 width=4)
 Sort Key: a
 ->  HashAggregate  (cost=96331.33..97325.12 rows=99379
width=4)
   Group Key: a
   ->  Parallel Seq Scan on t  (cost=0.00..85914.67
rows=417 width=4)
(9 rows)

It seems that including a LIMIT clause can potentially favor the new
plan.

Thanks
Richard


Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Sutou Kouhei
Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 2 Feb 2024 15:21:31 +0900,
  Michael Paquier  wrote:

> I have done a review of v10, see v11 attached which is still WIP, with
> the patches for COPY TO and COPY FROM merged together.  Note that I'm
> thinking to merge them into a single commit.

OK. I don't have a strong opinion for commit unit.

> @@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
>  boolconvert_selectively;/* do selective binary conversion? */
>  CopyOnErrorChoice on_error; /* what to do when error happened */
>  List   *convert_select; /* list of column names (can be NIL) */
> +constCopyToRoutine *to_routine;/* callback routines for COPY 
> TO */
>  } CopyFormatOptions;
> 
> Adding the routines to the structure for the format options is in my
> opinion incorrect.  The elements of this structure are first processed
> in the option deparsing path, and then we need to use the options to
> guess which routines we need.

This was discussed with Sawada-san a bit before. [1][2]

[1] 
https://www.postgresql.org/message-id/flat/CAD21AoBmNiWwrspuedgAPgbAqsn7e7NoZYF6gNnYBf%2BgXEk9Mg%40mail.gmail.com#bfd19262d261c67058fdb8d64e6a723c
[2] 
https://www.postgresql.org/message-id/flat/20240130.144531.1257430878438173740.kou%40clear-code.com#fc55392d77f400fc74e42686fe7e348a

I kept the routines in CopyFormatOptions for custom option
processing. But I should have not cared about it in this
patch set because this patch set doesn't include custom
option processing.

So I'm OK that we move the routines to
Copy{From,To}StateData.

> This also led to a strange separation with
> ProcessCopyOptionFormatFrom() and ProcessCopyOptionFormatTo() to fit
> the hole in-between.

They also for custom option processing. We don't need to
care about them in this patch set too.

> copyapi.h needs more documentation, like what is expected for
> extension developers when using these, what are the arguments, etc.  I
> have added what I had in mind for now.

Thanks! I'm not good at writing documentation in English...

> +typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, 
> int m);
> 
> CopyReadAttributes and PostpareColumnValue are also callbacks specific
> to text and csv, except that they are used within the per-row
> callbacks.  The same can be said about CopyAttributeOutHeaderFunction.
> It seems to me that it would be less confusing to store pointers to
> them in the routine structures, where the final picture involves not
> having multiple layers of APIs like CopyToCSVStart,
> CopyAttributeOutTextValue, etc.  These *have* to be documented
> properly in copyapi.h, and this is much easier now that cstate stores
> the routine pointers.  That would also make simpler function stacks.
> Note that I have not changed that in the v11 attached.
> 
> This business with the extra callbacks required for csv and text is my
> main point of contention, but I'd be OK once the model of the APIs is
> more linear, with everything in Copy{From,To}State.  The changes would
> be rather simple, and I'd be OK to put my hands on it.  Just,
> Sutou-san, would you agree with my last point about these extra
> callbacks?

I'm OK with the approach. But how about adding the extra
callbacks to Copy{From,To}StateData not
Copy{From,To}Routines like CopyToStateData::data_dest_cb and
CopyFromStateData::data_source_cb? They are only needed for
"text" and "csv". So we don't need to add them to
Copy{From,To}Routines to keep required callback minimum.

What is the better next action for us? Do you want to
complete the WIP v11 patch set by yourself (and commit it)?
Or should I take over it?


Thanks,
-- 
kou




Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Junwang Zhao
On Fri, Feb 2, 2024 at 2:21 PM Michael Paquier  wrote:
>
> On Fri, Feb 02, 2024 at 09:40:56AM +0900, Sutou Kouhei wrote:
> > Thanks. It'll help us.
>
> I have done a review of v10, see v11 attached which is still WIP, with
> the patches for COPY TO and COPY FROM merged together.  Note that I'm
> thinking to merge them into a single commit.
>
> @@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
>  boolconvert_selectively;/* do selective binary conversion? */
>  CopyOnErrorChoice on_error; /* what to do when error happened */
>  List   *convert_select; /* list of column names (can be NIL) */
> +constCopyToRoutine *to_routine;/* callback routines for COPY 
> TO */
>  } CopyFormatOptions;
>
> Adding the routines to the structure for the format options is in my
> opinion incorrect.  The elements of this structure are first processed
> in the option deparsing path, and then we need to use the options to
> guess which routines we need.  A more natural location is cstate
> itself, so as the pointer to the routines is isolated within copyto.c

I agree CopyToRoutine should be placed into CopyToStateData, but
why set it after ProcessCopyOptions, the implementation of
CopyToGetRoutine doesn't make sense if we want to support custom
format in the future.

Seems the refactor of v11 only considered performance but not
*extendable copy format*.

> and copyfrom_internal.h.  My point is: the routines are an
> implementation detail that the centralized copy.c has no need to know
> about.  This also led to a strange separation with
> ProcessCopyOptionFormatFrom() and ProcessCopyOptionFormatTo() to fit
> the hole in-between.
>
> The separation between cstate and the format-related fields could be
> much better, though I am not sure if it is worth doing as it
> introduces more duplication.  For example, max_fields and raw_fields
> are specific to text and csv, while binary does not care much.
> Perhaps this is just useful to be for custom formats.

I think those can be placed in format specific fields by utilizing the opaque
space, but yeah, this will introduce duplication.

>
> copyapi.h needs more documentation, like what is expected for
> extension developers when using these, what are the arguments, etc.  I
> have added what I had in mind for now.
>
> +typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, 
> int m);
>
> CopyReadAttributes and PostpareColumnValue are also callbacks specific
> to text and csv, except that they are used within the per-row
> callbacks.  The same can be said about CopyAttributeOutHeaderFunction.
> It seems to me that it would be less confusing to store pointers to
> them in the routine structures, where the final picture involves not
> having multiple layers of APIs like CopyToCSVStart,
> CopyAttributeOutTextValue, etc.  These *have* to be documented
> properly in copyapi.h, and this is much easier now that cstate stores
> the routine pointers.  That would also make simpler function stacks.
> Note that I have not changed that in the v11 attached.
>
> This business with the extra callbacks required for csv and text is my
> main point of contention, but I'd be OK once the model of the APIs is
> more linear, with everything in Copy{From,To}State.  The changes would
> be rather simple, and I'd be OK to put my hands on it.  Just,
> Sutou-san, would you agree with my last point about these extra
> callbacks?
> --
> Michael

If V7 and V10 have no performance reduction, then I think V6 is also
good with performance, since most of the time goes to CopyToOneRow
and CopyFromOneRow.

I just think we should take the *extendable* into consideration at
the beginning.

-- 
Regards
Junwang Zhao




Re: Is this a problem in GenericXLogFinish()?

2024-02-01 Thread Michael Paquier
On Fri, Dec 01, 2023 at 03:27:33PM +0530, Amit Kapila wrote:
> Pushed!

Amit, this has been applied as of 861f86beea1c, and I got pinged about
the fact this triggers inconsistencies because we always set the LSN
of the write buffer (wbuf in _hash_freeovflpage) but
XLogRegisterBuffer() would *not* be called when the two following
conditions happen:
- When xlrec.ntups <= 0.
- When !xlrec.is_prim_bucket_same_wrt && !xlrec.is_prev_bucket_same_wrt

And it seems to me that there is still a bug here: there should be no
point in setting the LSN on the write buffer if we don't register it
in WAL at all, no?
--
Michael


signature.asc
Description: PGP signature


Re: Small fix on COPY ON_ERROR document

2024-02-01 Thread Yugo NAGATA
On Fri, 02 Feb 2024 11:29:41 +0900
torikoshia  wrote:

> On 2024-02-01 15:16, Yugo NAGATA wrote:
> > On Mon, 29 Jan 2024 15:47:25 +0900
> > Yugo NAGATA  wrote:
> > 
> >> On Sun, 28 Jan 2024 19:14:58 -0700
> >> "David G. Johnston"  wrote:
> >> 
> >> > > Also, I think "invalid input syntax" is a bit ambiguous. For example,
> >> > > COPY FROM raises an error when the number of input column does not 
> >> > > match
> >> > > to the table schema, but this error is not ignored by ON_ERROR while
> >> > > this seems to fall into the category of "invalid input syntax".
> >> >
> >> >
> >> >
> >> > It is literally the error text that appears if one were not to ignore it.
> >> > It isn’t a category of errors.  But I’m open to ideas here.  But being
> >> > explicit with what on actually sees in the system seemed preferable to
> >> > inventing new classification terms not otherwise used.
> >> 
> >> Thank you for explanation! I understood the words was from the error 
> >> messages
> >> that users actually see. However, as Torikoshi-san said in [1], errors 
> >> other
> >> than valid input syntax (e.g. range error) can be also ignored, 
> >> therefore it
> >> would be better to describe to be ignored errors more specifically.
> >> 
> >> [1] 
> >> https://www.postgresql.org/message-id/7f1457497fa3bf9dfe486f162d1c8ec6%40oss.nttdata.com
> >> 
> >> >
> >> > >
> >> > > So, keeping consistency with the existing description, we can say:
> >> > >
> >> > > "Specifies which how to behave when encountering an error due to
> >> > >  column values unacceptable to the input function of each attribute's
> >> > >  data type."
> >> >
> >> >
> >> > Yeah, I was considering something along those lines as an option as well.
> >> > But I’d rather add that wording to the glossary.
> >> 
> >> Although I am still be not convinced if we have to introduce the words
> >> "soft error" to the documentation, I don't care it if there are no 
> >> other
> >> opposite opinions.
> > 
> > Attached is a updated patch v3, which is a version that uses the above
> > wording instead of "soft error".
> > 
> >> >
> >> > > Currently, ON_ERROR doesn't support other soft errors, so it can 
> >> > > explain
> >> > > it more simply without introducing the new concept, "soft error" to 
> >> > > users.
> >> > >
> >> > >
> >> > Good point.  Seems we should define what user-facing errors are ignored
> >> > anywhere in the system and if we aren’t consistently leveraging these in
> >> > all areas/commands make the necessary qualifications in those specific
> >> > places.
> >> >
> >> 
> >> > > I think "left in a deleted state" is also unclear for users because 
> >> > > this
> >> > > explains the internal state but not how looks from user's view.How 
> >> > > about
> >> > > leaving the explanation "These rows will not be visible or accessible" 
> >> > > in
> >> > > the existing statement?
> >> > >
> >> >
> >> > Just visible then, I don’t like an “or” there and as tuples at least they
> >> > are accessible to the system, in vacuum especially.  But I expected the
> >> > user to understand “as if you deleted it” as their operational concept 
> >> > more
> >> > readily than visible.  I think this will be read by people who haven’t 
> >> > read
> >> > MVCC to fully understand what visible means but know enough to run vacuum
> >> > to clean up updated and deleted data as a rule.
> >> 
> >> Ok, I agree we can omit "or accessible". How do you like the 
> >> followings?
> >> Still redundant?
> >> 
> >>  "If the command fails, these rows are left in a deleted state;
> >>   these rows will not be visible, but they still occupy disk space. "
> > 
> > Also, the above statement is used in the patch.
> 
> Thanks for updating the patch!
> 
> I like your description which doesn't use the word soft error.

Thank you for your comments!

> 
> Here are minor comments:
> 
> > +  ignore means discard the input row and 
> > continue with the next one.
> > +  The default is stop
> 
> Is "." required at the end of the line?
>
> >  An NOTICE level context message containing the 
> > ignored row count is
> 
> Should 'An' be 'A'?
> 
> Also, I wasn't sure the necessity of 'context'.
> It might be possible to just say "A NOTICE message containing the 
> ignored row count.."
> considering below existing descriptions:
> 
>doc/src/sgml/pltcl.sgml: a NOTICE message each 
> time a supported command is
>doc/src/sgml/pltcl.sgml- executed:
> 
>doc/src/sgml/plpgsql.sgml: This example trigger simply raises a 
> NOTICE message
>doc/src/sgml/plpgsql.sgml- each time a supported command is 
> executed.

I attached a updated patch including fixes you pointed out above.

Regards,
Yugo Nagata

> -- 
> Regards,
> 
> --
> Atsushi Torikoshi
> NTT DATA Group Corporation


-- 
Yugo NAGATA 
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..3c2feaa11a 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -90,6 +90,13 @@ COPY { 

Re: Synchronizing slots from primary to standby

2024-02-01 Thread Amit Kapila
On Thu, Feb 1, 2024 at 5:29 PM shveta malik  wrote:
>
> On Thu, Feb 1, 2024 at 2:35 PM Amit Kapila  wrote:
> >
> > Agreed, and I am fine with merging 0001, 0002, and 0004 as suggested
> > by you though I have a few minor comments on 0002 and 0004. I was
> > thinking about what will be a logical way to split the slot sync
> > worker patch (combined result of 0001, 0002, and 0004), and one idea
> > occurred to me is that we can have the first patch as
> > synchronize_solts() API and the functionality required to implement
> > that API then the second patch would be a slot sync worker which uses
> > that API to synchronize slots and does all the required validations.
> > Any thoughts?
>
> If we shift 'synchronize_slots()' to the first patch but there is no
> caller of it, we may have a compiler warning for the same. The only
> way it can be done is if we temporarily add SQL function on standby
> which uses 'synchronize_slots()'. This SQL function can then be
> removed in later patches where we actually have a caller for
> 'synchronize_slots'.
>

Can such a SQL function say pg_synchronize_slots() which can sync all
slots that have a failover flag set be useful in general apart from
just writing tests for this new API? I am thinking maybe users want
more control over when to sync the slots and write their bgworker or
simply do it just before shutdown once (sort of planned switchover) or
at some other pre-defined times. BTW, we also have
pg_log_standby_snapshot() which otherwise would be done periodically
by background processes.

>
> 1) Re-arranged the patches:
> 1.1) 'libpqrc' related changes (from v74-001 and v74-004) are
> separated out in v75-001 as those are independent changes.

Bertrand, Sawada-San, and others, do you see a problem with such a
split? Can we go ahead with v75_0001 separately after fixing the open
comments?

-- 
With Regards,
Amit Kapila.




Re: Synchronizing slots from primary to standby

2024-02-01 Thread Peter Smith
Here are some review comments for v750002.

(this is a WIP but this is what I found so far...)

==
doc/src/sgml/protocol.sgml

1.
> > 2. ALTER_REPLICATION_SLOT slot_name ( option [, ...] ) #
> >
> >   
> > -  If true, the slot is enabled to be synced to the standbys.
> > +  If true, the slot is enabled to be synced to the standbys
> > +  so that logical replication can be resumed after failover.
> >   
> >
> > This also should have the sentence "The default is false.", e.g. the
> > same as the same option in CREATE_REPLICATION_SLOT says.
>
> I have not added this. I feel the default value related details should
> be present in the 'CREATE' part, it is not meaningful for the "ALTER"
> part. ALTER does not have any defaults, it just modifies the options
> given by the user.

You are correct. My mistake.

==
src/backend/postmaster/bgworker.c

2.
 #include "replication/logicalworker.h"
+#include "replication/worker_internal.h"
 #include "storage/dsm.h"

Is this change needed when the rest of the code is removed?

==
src/backend/replication/logical/slotsync.c

3. synchronize_one_slot

> > 3.
> > + /*
> > + * Make sure that concerned WAL is received and flushed before syncing
> > + * slot to target lsn received from the primary server.
> > + *
> > + * This check will never pass if on the primary server, user has
> > + * configured standby_slot_names GUC correctly, otherwise this can hit
> > + * frequently.
> > + */
> > + latestFlushPtr = GetStandbyFlushRecPtr(NULL);
> > + if (remote_slot->confirmed_lsn > latestFlushPtr)
> >
> > BEFORE
> > This check will never pass if on the primary server, user has
> > configured standby_slot_names GUC correctly, otherwise this can hit
> > frequently.
> >
> > SUGGESTION (simpler way to say the same thing?)
> > This will always be the case unless the standby_slot_names GUC is not
> > correctly configured on the primary server.
>
> It is not true. It will not hit this condition "always" but has higher
> chances to hit it when standby_slot_names is not configured. I think
> you meant 'unless the standby_slot_names GUC is correctly configured'.
> I feel the current comment gives clear info (less confusing) and thus
> I have not changed it for the time being. I can consider if I get more
> comments there.

Hmm. I meant what I wrote. The "This" of my suggested text refers to
the previous sentence in the comment (not about "hitting" ?? your
condition).

TBH, regardless of the wording you choose, I think it will be much
clearer to move the comment to be inside the if.

SUGGESTION
/*
 * Make sure that concerned WAL is received and flushed before syncing
 * slot to target lsn received from the primary server.
 */
latestFlushPtr = GetStandbyFlushRecPtr(NULL);
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
  /*
   * Can get here only when if GUC 'standby_slot_names' on the primary
   * server was not configured correctly.
   */
...
}

~~~

4.
+static bool
+validate_parameters_and_get_dbname(char **dbname)
+{
+ /*
+ * A physical replication slot(primary_slot_name) is required on the
+ * primary to ensure that the rows needed by the standby are not removed
+ * after restarting, so that the synchronized slot on the standby will not
+ * be invalidated.
+ */
+ if (PrimarySlotName == NULL || *PrimarySlotName == '\0')
+ {
+ ereport(LOG,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be defined.", "primary_slot_name"));
+ return false;
+ }
+
+ /*
+ * hot_standby_feedback must be enabled to cooperate with the physical
+ * replication slot, which allows informing the primary about the xmin and
+ * catalog_xmin values on the standby.
+ */
+ if (!hot_standby_feedback)
+ {
+ ereport(LOG,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be enabled.", "hot_standby_feedback"));
+ return false;
+ }
+
+ /*
+ * Logical decoding requires wal_level >= logical and we currently only
+ * synchronize logical slots.
+ */
+ if (wal_level < WAL_LEVEL_LOGICAL)
+ {
+ ereport(LOG,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"wal_level\" must be >= logical."));
+ return false;
+ }
+
+ /*
+ * The primary_conninfo is required to make connection to primary for
+ * getting slots information.
+ */
+ if (PrimaryConnInfo == NULL || *PrimaryConnInfo == '\0')
+ {
+ ereport(LOG,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be defined.", "primary_conninfo"));
+ return false;
+ }
+
+ /*
+ * The slot sync worker needs a database connection for walrcv_exec to
+ * work.
+ */
+ *dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo);
+ if (*dbname == 

Re: Commitfest 2024-01 is now closed

2024-02-01 Thread Masahiko Sawada
On Fri, Feb 2, 2024 at 2:19 PM vignesh C  wrote:
>
> Hi,
>
> Thanks a lot to all the members who participated in the commitfest.
>
> Here are the final numbers at the end of the commitfest:
> status |   w1  |  w2  |   w3   |   w4|   End
> ++--++-+--
> Needs review:   |   238  | 213  |  181|  146   |   N/A
> Waiting on Author | 44  |46  |   52|65   |   N/A
> Ready for Committer| 27  |   27   |   26|24   |   N/A
> Committed:   | 36  |   46   |   57|71   |75
> Moved to next CF |   1  | 3   | 4|  4   |   208
> Withdrawn|   2  | 4   |   12|14   | 16
> RWF |   3  |   12   |   18|26  |  51
> Rejected   |   1  | 1   | 2|  2   |   
> 2
> Total  |   352 |  352   |  352   |  352  |352
>
> Committed patches comparison with previous January commitfests:
> 2024: 75 committed
> 2023: 70 committed
> 2022: 58 committed
> 2021: 56 committed
> 2020: 49 committed
>
>  A special thanks to the reviewers/committers who spent tireless
> effort in moving the patches forward.
>

Thank you for all the hard work, Vignesh!

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c)

2024-02-01 Thread Michael Paquier
On Fri, Feb 02, 2024 at 12:04:39AM +0530, vignesh C wrote:
> The patch which you submitted has been awaiting your attention for
> quite some time now.  As such, we have moved it to "Returned with
> Feedback" and removed it from the reviewing queue. Depending on
> timing, this may be reversible.  Kindly address the feedback you have
> received, and resubmit the patch to the next CommitFest.

Even with that, it seems to me that this is not required now that
21d9c3ee4ef7 outlines better how long SMgrRelation pointers should
live, no?
--
Michael


signature.asc
Description: PGP signature


Re: Improve eviction algorithm in ReorderBuffer

2024-02-01 Thread Masahiko Sawada
On Wed, Jan 31, 2024 at 2:18 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Sawada-san,
>
> I have started to read your patches. Here are my initial comments.
> At least, all subscription tests were passed on my env.

Thank you for the review comments!

>
> A comment for 0001:
>
> 01.
> ```
> +static void
> +bh_enlarge_node_array(binaryheap *heap)
> +{
> +if (heap->bh_size < heap->bh_space)
> +return;
> +
> +heap->bh_space *= 2;
> +heap->bh_nodes = repalloc(heap->bh_nodes,
> +  sizeof(bh_node_type) * heap->bh_space);
> +}
> ```
>
> I'm not sure it is OK to use repalloc() for enlarging bh_nodes. This data
> structure public one and arbitrary codes and extensions can directly refer
> bh_nodes. But if the array is repalloc()'d, the pointer would be updated so 
> that
> their referring would be a dangling pointer.

Hmm I'm not sure this is the case that we need to really worry about,
and cannot come up with a good use case where extensions refer to
bh_nodes directly rather than binaryheap. In PostgreSQL codes, many
Nodes already have pointers and are exposed.

> I think the internal of the structure should be a private one in this case.
>
> Comments for 0002:
>
> 02.
> ```
> +#include "utils/palloc.h"
> ```
>
> Is it really needed? I'm not sure who referrs it.

Seems not, will remove.

>
> 03.
> ```
> typedef struct bh_nodeidx_entry
> {
> bh_node_typekey;
> charstatus;
> int idx;
> } bh_nodeidx_entry;
> ```
>
> Sorry if it is a stupid question. Can you tell me how "status" is used?
> None of binaryheap and reorderbuffer components refer it.

It's required by simplehash.h

>
> 04.
> ```
>  extern binaryheap *binaryheap_allocate(int capacity,
> binaryheap_comparator compare,
> -   void *arg);
> +   bool indexed, void *arg);
> ```
>
> I felt pre-existing API should not be changed. How about adding
> binaryheap_allocate_extended() or something which can specify the `bool 
> indexed`?
> binaryheap_allocate() sets heap->bh_indexed to false.

I'm really not sure it's worth inventing a
binaryheap_allocate_extended() function just for preserving API
compatibility. I think it's generally a good idea to have
xxx_extended() function to increase readability and usability, for
example, for the case where the same (kind of default) arguments are
passed in most cases and the function is called from many places.
However, we have a handful binaryheap_allocate() callers, and I
believe that it would not hurt the existing callers.

>
> 05.
> ```
> +extern void binaryheap_update_up(binaryheap *heap, bh_node_type d);
> +extern void binaryheap_update_down(binaryheap *heap, bh_node_type d);
> ```
>
> IIUC, callers must consider whether the node should be shift up/down and use
> appropriate function, right? I felt it may not be user-friendly.

Right, I couldn't come up with a better interface.

Another idea I've considered was that the caller provides a callback
function where it can compare the old and new keys. For example, in
reorderbuffer case, we call like:

binaryheap_update(rb->txn_heap, PointerGetDatum(txn),
ReorderBufferTXNUpdateCompare, (void *) _size);

Then in ReorderBufferTXNUpdateCompare(),
ReorderBufferTXN *txn = (ReorderBufferTXN *) a;Size old_size = *(Size *) b;
(compare txn->size to "b" ...)

However it seems complicated...

>
> Comments for 0003:
>
> 06.
> ```
> This commit changes the eviction algorithm in ReorderBuffer to use
> max-heap with transaction size,a nd use two strategies depending on
> the number of transactions being decoded.
> ```
>
> s/a nd/ and/
>
> 07.
> ```
> It could be too expensive to pudate max-heap while preserving the heap
> property each time the transaction's memory counter is updated, as it
> could happen very frquently. So when the number of transactions being
> decoded is small, we add the transactions to max-heap but don't
> preserve the heap property, which is O(1). We heapify the max-heap
> just before picking the largest transaction, which is O(n). This
> strategy minimizes the overheads of updating the transaction's memory
> counter.
> ```
>
> s/pudate/update/

Will fix them.

>
> 08.
> IIUC, if more than 1024 transactions are running but they have small amount of
> changes, the performance may be degraded, right? Do you have a result in sucha
> a case?

I've run a benchmark test that I shared before[1]. Here are results of
decoding a transaction that has 1M subtransaction each of which has 1
INSERT:

HEAD:
1810.192 ms

HEAD w/ patch:
2001.094 ms

I set a large enough value to logical_decoding_work_mem not to evict
any transactions. I can see about about 10% performance regression in
this case.

Regards,

[1] 

Re: Commitfest 2024-01 is now closed

2024-02-01 Thread Michael Paquier
On Fri, Feb 02, 2024 at 11:56:36AM +0530, Amit Kapila wrote:
> On Fri, Feb 2, 2024 at 10:58 AM Tom Lane  wrote:
>> Thanks for all the work you did running it!  CFM is typically a
>> thankless exercise in being a nag, but I thought you put in more
>> than the usual amount of effort to keep things moving.
>>
> 
> +1. Thanks a lot for all the hard work.

+1.  Thanks!
--
Michael


signature.asc
Description: PGP signature


Re: Synchronizing slots from primary to standby

2024-02-01 Thread Masahiko Sawada
On Fri, Feb 2, 2024 at 1:58 PM Amit Kapila  wrote:
>
> On Fri, Feb 2, 2024 at 6:46 AM Masahiko Sawada  wrote:
> >
> > On Thu, Feb 1, 2024 at 12:51 PM Amit Kapila  wrote:
> > >
> > >
> > > > BTW I've tested the following switch/fail-back scenario but it seems
> > > > not to work fine. Am I missing something?
> > > >
> > > > Setup:
> > > > node1 is the primary, node2 is the physical standby for node1, and
> > > > node3 is the subscriber connecting to node1.
> > > >
> > > > Steps:
> > > > 1. [node1]: create a table and a publication for the table.
> > > > 2. [node2]: set enable_syncslot = on and start (to receive WALs from 
> > > > node1).
> > > > 3. [node3]: create a subscription with failover = true for the 
> > > > publication.
> > > > 4. [node2]: promote to the new standby.
> > > > 5. [node3]: alter subscription to connect the new primary, node2.
> > > > 6. [node1]: stop, set enable_syncslot = on (and other required
> > > > parameters), then start as a new standby.
> > > >
> > > > Then I got the error "exiting from slot synchronization because same
> > > > name slot "test_sub" already exists on the standby".
> > > >
> > > > The logical replication slot that was created on the old primary
> > > > (node1) has been synchronized to the old standby (node2). Therefore on
> > > > node2, the slot's "synced" field is true. However, once node1 starts
> > > > as the new standby with slot synchronization, the slotsync worker
> > > > cannot synchronize the slot because the slot's "synced" field on the
> > > > primary is false.
> > > >
> > >
> > > Yeah, we avoided doing anything in this case because the user could
> > > have manually created another slot with the same name on standby.
> > > Unlike WAL slots can be modified on standby as we allow decoding on
> > > standby, so we can't allow to overwrite the existing slots. We won't
> > > be able to distinguish whether the existing slot was a slot that the
> > > user wants to sync with primary or a slot created on standby to
> > > perform decoding. I think in this case user first needs to drop the
> > > slot on new standby.
> >
> > Yes, but if we do a switch-back further (i.e. in above case, node1
> > backs to the primary again and node becomes the standby again), the
> > user doesn't need to remove failover slots since they are already
> > marked as "synced".
>
> But, I think in this case node-2's timeline will be ahead of node-1,
> so will we be able to make node-2 follow node-1 again without any
> additional steps? One thing is not clear to me after promotion the
> timeline changes in WAL, so the locations in slots will be as per new
> timelines, after that will it be safe to sync slots from the new
> primary to old-primary?

In order for node-1 to go back to the primary again, it needs to be
promoted. That is, the node-1's timeline increments and node-2 follows
node-1.

>
> In general, I think after failover, we recommend running pg_rewind if
> the old primary has to follow the new primary to account for
> divergence in WAL. So, not sure we can safely start syncing slots in
> old-primary from new-primary, consider that in the new primary, the
> same name slot may have dropped/re-created multiple times.

Right. And I missed the point that all replication slots are removed
after pg_rewind. It would not be a problem in a failover case. But
probably we still need to consider a switchover cas (i.e. switch roles
with clean shutdowns) since it doesn't require to run pg_rewind?

> We can
> probably reset all the fields of the existing slot the first time
> syncing for an existing slot or do something like that but I think it
> would be better to just re-create the slot.
>
> >
>  I wonder if we could do something automatically to
> > reduce the user's operation.
>
> One possibility is that we forcefully drop/re-create the slot or
> directly overwrite the slot contents but that would probably be better
> done via some GUC or slot-level parameter. I feel we should leave this
> for another day, for the first version, we can document that an error
> will occur if the same name slots on standby exist, so users need to
> ensure that there shouldn't be an existing same name slots on standby
> before sync.
>

Hmm, I'm afraid it might not be user-friendly. But probably we can
leave it for now as it's not impossible.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Commitfest 2024-01 is now closed

2024-02-01 Thread Amit Kapila
On Fri, Feb 2, 2024 at 10:58 AM Tom Lane  wrote:
>
> vignesh C  writes:
> > Thanks a lot to all the members who participated in the commitfest.
>
> Thanks for all the work you did running it!  CFM is typically a
> thankless exercise in being a nag, but I thought you put in more
> than the usual amount of effort to keep things moving.
>

+1. Thanks a lot for all the hard work.

-- 
With Regards,
Amit Kapila.




Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Michael Paquier
On Fri, Feb 02, 2024 at 09:40:56AM +0900, Sutou Kouhei wrote:
> Thanks. It'll help us.

I have done a review of v10, see v11 attached which is still WIP, with
the patches for COPY TO and COPY FROM merged together.  Note that I'm
thinking to merge them into a single commit.

@@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
 boolconvert_selectively;/* do selective binary conversion? */
 CopyOnErrorChoice on_error; /* what to do when error happened */
 List   *convert_select; /* list of column names (can be NIL) */
+constCopyToRoutine *to_routine;/* callback routines for COPY 
TO */
 } CopyFormatOptions;

Adding the routines to the structure for the format options is in my
opinion incorrect.  The elements of this structure are first processed
in the option deparsing path, and then we need to use the options to
guess which routines we need.  A more natural location is cstate
itself, so as the pointer to the routines is isolated within copyto.c
and copyfrom_internal.h.  My point is: the routines are an
implementation detail that the centralized copy.c has no need to know
about.  This also led to a strange separation with
ProcessCopyOptionFormatFrom() and ProcessCopyOptionFormatTo() to fit
the hole in-between.

The separation between cstate and the format-related fields could be
much better, though I am not sure if it is worth doing as it
introduces more duplication.  For example, max_fields and raw_fields
are specific to text and csv, while binary does not care much.
Perhaps this is just useful to be for custom formats.

copyapi.h needs more documentation, like what is expected for
extension developers when using these, what are the arguments, etc.  I
have added what I had in mind for now.

+typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, int 
m);

CopyReadAttributes and PostpareColumnValue are also callbacks specific
to text and csv, except that they are used within the per-row
callbacks.  The same can be said about CopyAttributeOutHeaderFunction.
It seems to me that it would be less confusing to store pointers to
them in the routine structures, where the final picture involves not
having multiple layers of APIs like CopyToCSVStart,
CopyAttributeOutTextValue, etc.  These *have* to be documented
properly in copyapi.h, and this is much easier now that cstate stores
the routine pointers.  That would also make simpler function stacks.
Note that I have not changed that in the v11 attached.

This business with the extra callbacks required for csv and text is my
main point of contention, but I'd be OK once the model of the APIs is
more linear, with everything in Copy{From,To}State.  The changes would
be rather simple, and I'd be OK to put my hands on it.  Just,
Sutou-san, would you agree with my last point about these extra
callbacks?
--
Michael
From 7aca58d0f7adfd146d287059b1d11b47acdfa758 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Wed, 31 Jan 2024 13:37:02 +0900
Subject: [PATCH v11] Extract COPY FROM/TO format implementations

This doesn't change the current behavior.  This just introduces a set of
copy routines called CopyFromRoutine and CopyToRoutine, which just has
function pointers of format implementation like TupleTableSlotOps, and
use it for existing "text", "csv" and "binary" format implementations.
There are plans to extend that more in the future with custom formats.

This improves performance when a relation has many attributes as this
eliminates format-based if branches.  The more the attributes, the
better the performance gain.  Blah add some numbers.
---
 src/include/commands/copyapi.h   |  82 
 src/include/commands/copyfrom_internal.h |   8 +
 src/backend/commands/copyfrom.c  | 219 +++---
 src/backend/commands/copyfromparse.c | 439 
 src/backend/commands/copyto.c| 496 ---
 src/tools/pgindent/typedefs.list |   2 +
 6 files changed, 870 insertions(+), 376 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 00..3c0a6ba7a1
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,82 @@
+/*-
+ *
+ * copyapi.h
+ *	  API for COPY TO/FROM handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/* These are private in commands/copy[from|to].c */
+typedef struct CopyFromStateData *CopyFromState;
+typedef struct CopyToStateData *CopyToState;
+
+/*
+ * API structure for a COPY FROM format implementation.   Note this 

Re: Synchronizing slots from primary to standby

2024-02-01 Thread Amit Kapila
On Fri, Feb 2, 2024 at 9:50 AM Peter Smith  wrote:
>
> Here are some review comments for v750001.
>
> ~~~
>
> 2.
> This patch also implements a new API libpqrcv_get_dbname_from_conninfo()
> to extract database name from the given connection-info
>
> ~
>
> /extract database name/the extract database name/
>

I think it should be "..extract the database name.."

> ==
> .../libpqwalreceiver/libpqwalreceiver.c
>

>
> 4. libpqrcv_connect
>
>  /*
> - * Establish the connection to the primary server for XLOG streaming
> + * Establish the connection to the primary server.
> + *
> + * The connection established could be either a replication one or
> + * a non-replication one based on input argument 'replication'. And further
> + * if it is a replication connection, it could be either logical or physical
> + * based on input argument 'logical'.
>
> That first comment ("could be either a replication one or...") seemed
> a bit meaningless (e.g. it like saying "this boolean argument can be
> true or false") because it doesn't describe what is the meaning of a
> "replication connection" versus what is a "non-replication
> connection".
>

The replication connection is a term already used in the code and
docs. For example, see the error message: "pg_hba.conf rejects
replication connection for host ..". It means that for communication
the connection would use replication protocol instead of the normal
(one used by queries) protocol. The other possibility could be to
individually explain each parameter but I think that is not what we
follow in this or related functions. I feel we can use a simple
comment like: "This API can be used for both replication and regular
connections."

> ~~~
>
> 5.
> /* We can not have logical without replication */
> Assert(replication || !logical);
>
> if (replication)
> {
> keys[++i] = "replication";
> vals[i] = logical ? "database" : "true";
>
> if (!logical)
> {
> /*
> * The database name is ignored by the server in replication mode,
> * but specify "replication" for .pgpass lookup.
> */
> keys[++i] = "dbname";
> vals[i] = "replication";
> }
> }
>
> keys[++i] = "fallback_application_name";
> vals[i] = appname;
> if (logical)
> {
> ...
> }
>
> ~
>
> The Assert already says we cannot be 'logical' if not 'replication',
> therefore IMO it seemed strange that the code was not refactored to
> bring that 2nd "if (logical)" code to within the scope of the "if
> (replication)".
>
> e.g. Can't you do something like this:
>
> Assert(replication || !logical);
>
> if (replication)
> {
>   ...
>   if (logical)
>   {
> ...
>   }
>   else
>   {
> ...
>   }
> }
> keys[++i] = "fallback_application_name";
> vals[i] = appname;
>

+1.

> ~~~
>
> 6. libpqrcv_get_dbname_from_conninfo
>
> + for (PQconninfoOption *opt = opts; opt->keyword != NULL; ++opt)
> + {
> + /*
> + * If multiple dbnames are specified, then the last one will be
> + * returned
> + */
> + if (strcmp(opt->keyword, "dbname") == 0 && opt->val &&
> + opt->val[0] != '\0')
> + dbname = pstrdup(opt->val);
> + }
>
> Should you also pfree the old dbname instead of gathering a bunch of
> strdups if there happened to be multiple dbnames specified ?
>
> SUGGESTION
> if (strcmp(opt->keyword, "dbname") == 0 && opt->val && *opt->val)
> {
>if (dbname)
> pfree(dbname);
>dbname = pstrdup(opt->val);
> }
>

makes sense and shouldn't we need to call PQconninfoFree(opts); at the
end of libpqrcv_get_dbname_from_conninfo() similar to
libpqrcv_check_conninfo()?

-- 
With Regards,
Amit Kapila.




Re: Add new COPY option REJECT_LIMIT

2024-02-01 Thread torikoshia

On 2024-01-27 00:20, David G. Johnston wrote:

Thanks for your comments!


On Fri, Jan 26, 2024 at 2:49 AM torikoshia
 wrote:


Hi,

9e2d870 enabled the COPY command to skip soft error, and I think we
can
add another option which specifies the maximum tolerable number of
soft
errors.

I remember this was discussed in [1], and feel it would be useful
when
loading 'dirty' data but there is a limit to how dirty it can be.

Attached a patch for this.

What do you think?


I'm opposed to adding this particular feature.

When implementing this kind of business rule I'd need the option to
specify a percentage, not just an absolute value.


Yeah, it seems useful for some cases.
Actually, Greenplum enables to specify not only the max number of bad 
rows but also its percentage[1].


I may be wrong, but considering some dataloaders support something like 
reject_limit(Redshift supports MAXERROR[2], pg_bulkload supports 
PARSE_ERRORS[3]), specifying the "number" of the bad row might also be 
useful.


I think we can implement reject_limit specified by percentage simply 
calculating the ratio of skipped and processed at the end of CopyFrom() 
like this:


  if (cstate->opts.reject_limit > 0 && (double) skipped / (processed + 
skipped) > cstate->opts.reject_limit_percent)

  ereport(ERROR,
  
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
   errmsg("exceeded the ratio specified 
by ..




I would focus on trying to put the data required to make this kind of
determination into a place where applications implementing such
business rules and monitoring can readily get at it.  The "ERRORS TO"
and maybe a corresponding "STATS TO" option where a table can be
specified for the system to place the problematic data and stats about
the copy itself.


It'd be nice to have such informative tables, but I believe the benefit 
of reject_limit is it fails the entire loading when the threshold is 
exceeded.
I imagine when we just have error and stats information tables for COPY, 
users have to delete the rows when they confirmed too many errors in 
these tables.



[1]https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/admin_guide-load-topics-g-handling-load-errors.html
[2]https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-load.html
[3]https://ossc-db.github.io/pg_bulkload/pg_bulkload.html

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-02-01 Thread Dilip Kumar
On Thu, Feb 1, 2024 at 4:34 PM Dilip Kumar  wrote:
>
> On Thu, Feb 1, 2024 at 4:12 PM Dilip Kumar  wrote:
> >
> > On Thu, Feb 1, 2024 at 3:44 PM Alvaro Herrera  
> > wrote:
>
> > Okay.
> > >
> > > While I have your attention -- if you could give a look to the 0001
> > > patch I posted, I would appreciate it.
> > >
> >
> > I will look into it.  Thanks.
>
> Some quick observations,
>
> Do we need below two write barriers at the end of the function?
> because the next instruction is separated by the function boundary
>
> @@ -766,14 +766,11 @@ StartupCLOG(void)
>   ..
> - XactCtl->shared->latest_page_number = pageno;
> -
> - LWLockRelease(XactSLRULock);
> + pg_atomic_init_u64(>shared->latest_page_number, pageno);
> + pg_write_barrier();
>  }
>
> /*
>   * Initialize member's idea of the latest page number.
>   */
>   pageno = MXOffsetToMemberPage(offset);
> - MultiXactMemberCtl->shared->latest_page_number = pageno;
> + pg_atomic_init_u64(>shared->latest_page_number,
> +pageno);
> +
> + pg_write_barrier();
>  }
>

I have checked the patch and it looks fine to me other than the above
question related to memory barrier usage one more question about the
same, basically below to instances 1 and 2 look similar but in 1 you
are not using the memory write_barrier whereas in 2 you are using the
write_barrier, why is it so?  I mean why the reordering can not happen
in 1 and it may happen in 2?

1.
+ pg_atomic_write_u64(>shared->latest_page_number,
+ trunc->pageno);

  SimpleLruTruncate(CommitTsCtl, trunc->pageno);

vs
2.

  - shared->latest_page_number = pageno;
+ pg_atomic_write_u64(>latest_page_number, pageno);
+ pg_write_barrier();

  /* update the stats counter of zeroed pages */
  pgstat_count_slru_page_zeroed(shared->slru_stats_idx);



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Synchronizing slots from primary to standby

2024-02-01 Thread Bertrand Drouvot
Hi,

On Thu, Feb 01, 2024 at 05:29:15PM +0530, shveta malik wrote:
> Attached v75 patch-set. Changes are:
> 
> 1) Re-arranged the patches:
> 1.1) 'libpqrc' related changes (from v74-001 and v74-004) are
> separated out in v75-001 as those are independent changes.
> 1.2) 'Add logical slot sync capability', 'Slot sync worker as special
> process' and 'App-name changes' are now merged to single patch which
> makes v75-002.
> 1.3) 'Wait for physical Standby confirmation' and 'Failover Validation
> Document' patches are maintained as is (v75-003 and v75-004 now).

Thanks!

I only looked at the commit message for v75-0002 and see that it has changed
since the comment done in [1], but it still does not look correct to me.

"
If a logical slot on the primary is valid but is invalidated on the standby,
then that slot is dropped and recreated on the standby in next sync-cycle
provided the slot still exists on the primary server. It is okay to recreate
such slots as long as these are not consumable on the standby (which is the
case currently). This situation may occur due to the following reasons:
- The max_slot_wal_keep_size on the standby is insufficient to retain WAL
  records from the restart_lsn of the slot.
- primary_slot_name is temporarily reset to null and the physical slot is
  removed.
- The primary changes wal_level to a level lower than logical.
"

If a logical decoding slot "still exists on the primary server" then the primary
can not change the wal_level to lower than logical, one would get something 
like:

"FATAL:  logical replication slot "logical_slot" exists, but wal_level < 
logical"

and then slots won't get invalidated on the standby. I've the feeling that the
wal_level conflict part may need to be explained separately? (I think it's not
possible that they end up being re-created on the standby for this conflict, 
they
will be simply removed as it would mean the counterpart one on the primary 
does not exist anymore).

[1]: 
https://www.postgresql.org/message-id/ZYWdSIeAMQQcLmVT%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-02-01 Thread Alexander Lakhin

Hello Vignesh and Hou-san,

01.02.2024 07:59, vignesh C wrote:

Here is an updated patch which changes the boolean variable to a
tri-state enum and set stable state to valid only if no invalidations
have been occurred while the list is being prepared.



While testing the v3 patch, I observed another anomaly with the 014_binary
test. When I run the test in parallel, I see that sometimes one of the test
instances runs much longer than others. For example:
49  All tests successful.
49  Files=1, Tests=8,  4 wallclock secs ( 0.03 usr  0.01 sys + 0.43 cusr  
0.30 csys =  0.77 CPU)
49  Result: PASS
12  All tests successful.
12  Files=1, Tests=8, 184 wallclock secs ( 0.02 usr  0.01 sys + 0.46 cusr  
0.40 csys =  0.89 CPU)
12  Result: PASS

As far as I can see, this happens due to another race condition, this time
in launcher.c.
For such a three-minute case I see in _subscriber.log:
2024-02-01 14:33:13.604 UTC [949255] DEBUG:  relation 
"public.test_mismatching_types" does not exist
2024-02-01 14:33:13.604 UTC [949255] CONTEXT:  processing remote data for replication origin "pg_16398" during message 
type "INSERT" in transaction 757, finished at 0/153C838
2024-02-01 14:33:13.604 UTC [949255] ERROR:  logical replication target relation "public.test_mismatching_types" does 
not exist
2024-02-01 14:33:13.604 UTC [949255] CONTEXT:  processing remote data for replication origin "pg_16398" during message 
type "INSERT" in transaction 757, finished at 0/153C838

...
2024-02-01 14:33:13.605 UTC [949276] 014_binary.pl LOG:  statement: CREATE 
TABLE public.test_mismatching_types (
    a int PRIMARY KEY
    );
2024-02-01 14:33:13.605 UTC [942451] DEBUG:  unregistering background worker "logical replication apply worker for 
subscription 16398"
2024-02-01 14:33:13.605 UTC [942451] LOG:  background worker "logical replication apply worker" (PID 949255) exited with 
exit code 1

...
2024-02-01 14:33:13.607 UTC [949276] 014_binary.pl LOG:  statement: ALTER 
SUBSCRIPTION tsub REFRESH PUBLICATION;
...
2024-02-01 14:36:13.642 UTC [942527] DEBUG:  starting logical replication worker for 
subscription "tsub"
(there is no interesting activity between 14:33:13 and 14:36:13)

So logical replication apply worker exited because CREATE TABLE was not
executed on subscriber yet, and new replication worker started because of a
timeout occurred in WaitLatch(..., wait_time, ...) inside
ApplyLauncherMain() (wait_time in this case is DEFAULT_NAPTIME_PER_CYCLE
(180 sec)).

But in a normal (fast) case the same WaitLatch exits due to MyLatch set as
a result of:
logical replication apply worker| logicalrep_worker_onexit() ->
  ApplyLauncherWakeup() -> kill(LogicalRepCtx->launcher_pid, SIGUSR1) ->
launcher| procsignal_sigusr1_handler() -> SetLatch(MyLatch)).

In a bad case, I see that the SetLatch() called as well, but then the latch
gets reset by the following code in WaitForReplicationWorkerAttach():
    rc = WaitLatch(MyLatch,
   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
   10L, WAIT_EVENT_BGWORKER_STARTUP);

    if (rc & WL_LATCH_SET)
    {
    ResetLatch(MyLatch);
    CHECK_FOR_INTERRUPTS();
    }

With pg_usleep(30); added just before ResetLatch and
 $node_subscriber->safe_psql(
 'postgres', qq(
+SELECT pg_sleep(0.5);
 CREATE TABLE public.test_mismatching_types (
in 014_binary.pl, I can see the anomaly without running tests in parallel,
just when running that test in a loop:
for ((i=1;i<=10;i++)); do echo "iteration $i"; make -s check -C src/test/subscription/ 
PROVE_TESTS="t/014*"; done
...
iteration 2
# +++ tap check in src/test/subscription +++
t/014_binary.pl .. ok
All tests successful.
Files=1, Tests=8,  5 wallclock secs ( 0.00 usr  0.00 sys +  0.24 cusr  0.18 
csys =  0.42 CPU)
Result: PASS
iteration 3
# +++ tap check in src/test/subscription +++
t/014_binary.pl .. ok
All tests successful.
Files=1, Tests=8, 183 wallclock secs ( 0.02 usr  0.00 sys +  0.28 cusr  0.25 
csys =  0.55 CPU)
Result: PASS
...

In other words, the abnormal test execution takes place due to the
following events:
1. logicalrep worker launcher launches replication worker and waits for it
 to attach:
 ApplyLauncherMain() -> logicalrep_worker_launch() -> 
WaitForReplicationWorkerAttach()
2. logicalrep worker exits due to some error (logical replication target
 relation does not exist, in our case) and sends a signal to set the latch
 for launcher
3. launcher sets the latch in procsignal_sigusr1_handler(), but then resets
 it inside WaitForReplicationWorkerAttach()
4. launcher gets back to ApplyLauncherMain() where it waits for the latch
 (not set) or a timeout (which happens after DEFAULT_NAPTIME_PER_CYCLE ms).

Moreover, with that sleep in WaitForReplicationWorkerAttach() added, the
test 027_nosuperuser executes for 3+ minutes on each run for me.
And we can find such abnormal execution on the buildfarm too:

Re: SQL:2011 application time

2024-02-01 Thread jian he
On Mon, Jan 29, 2024 at 8:00 AM jian he  wrote:
>
> I fixed your tests, some of your tests can be simplified, (mainly
> primary key constraint is unnecessary for the failed tests)
> also your foreign key patch test table, temporal_rng is created at
> line 141, and we use it at around line 320.
> it's hard to get the definition of temporal_rng.  I drop the table
> and recreate it.
> So people can view the patch with tests more easily.
>
I've attached a new patch that further simplified the tests. (scope
v24 patch's 0002 and 0003)
Please ignore previous email attachments.

I've only applied the v24, 0002, 0003.
seems in doc/src/sgml/ref/create_table.sgml
lack the explanation of `temporal_interval`

since foreign key ON {UPDATE | DELETE} {CASCADE,SET NULL,SET DEFAULT}
not yet supported,
v24-0003 create_table.sgml should reflect that.

+ /*
+ * For FKs with PERIOD we need an operator and aggregate function
+ * to check whether the referencing row's range is contained
+ * by the aggregated ranges of the referenced row(s).
+ * For rangetypes this is fk.periodatt <@ range_agg(pk.periodatt).
+ * FKs will look these up at "runtime", but we should make sure
+ * the lookup works here.
+ */
+ if (is_temporal)
+ FindFKPeriodOpersAndProcs(opclasses[numpks - 1], ,
);

within the function ATAddForeignKeyConstraint, you called
FindFKPeriodOpersAndProcs,
but never used the computed outputs: periodoperoid, periodprocoid,
opclasses.
We validate these(periodoperoid, periodprocoid) at
lookupTRIOperAndProc, FindFKPeriodOpersAndProcs.
I'm not sure whether FindFKPeriodOpersAndProcs in
ATAddForeignKeyConstraint is necessary.

+ * Check if all key values in OLD and NEW are "equivalent":
+ * For normal FKs we check for equality.
+ * For temporal FKs we check that the PK side is a superset of its old
value,
+ * or the FK side is a subset.
"or the FK side is a subset."  is misleading, should it be something
like "or the FK side is a subset of X"?

+ if (indexStruct->indisexclusion) return i - 1;
+ else return i;

I believe our style should be (with proper indent)
if (indexStruct->indisexclusion)
return i - 1;
else
return i;

in transformFkeyCheckAttrs
+ if (found && is_temporal)
+ {
+ found = false;
+ for (j = 0; j < numattrs + 1; j++)
+ {
+ if (periodattnum == indexStruct->indkey.values[j])
+ {
+ opclasses[numattrs] = indclass->values[j];
+ found = true;
+ break;
+ }
+ }
+ }

can be simplified:
{
found = false;
if (periodattnum == indexStruct->indkey.values[numattrs])
{
opclasses[numattrs] = indclass->values[numattrs];
found = true;
}
}

Also wondering, at the end of the function transformFkeyCheckAttrs `if
(!found)` part:
do we need another error message handle is_temporal is true?


@@ -212,8 +213,11 @@ typedef struct NewConstraint
  ConstrType contype; /* CHECK or FOREIGN */
  Oid refrelid; /* PK rel, if FOREIGN */
  Oid refindid; /* OID of PK's index, if FOREIGN */
+ bool conwithperiod; /* Whether the new FOREIGN KEY uses PERIOD */
  Oid conid; /* OID of pg_constraint entry, if FOREIGN */
  Node   *qual; /* Check expr or CONSTR_FOREIGN Constraint */
+ Oid   *operoids; /* oper oids for FOREIGN KEY with PERIOD */
+ Oid   *procoids; /* proc oids for FOREIGN KEY with PERIOD */
  ExprState  *qualstate; /* Execution state for CHECK expr */
 } NewConstraint;
primary key can only one WITHOUT OVERLAPS,
so *operoids and *procoids
can be replaced with just
`operoids, procoids`.
Also these two elements in struct NewConstraint not used in v24, 0002, 0003.
From 8ba03aa6a442d57bd0f2117e32e703fb211b68fd Mon Sep 17 00:00:00 2001
From: jian he 
Date: Fri, 2 Feb 2024 10:31:16 +0800
Subject: [PATCH v1 1/1] refactor temporal FOREIGN KEYs test

make related tests closer, remove unnecessary primary key constraint
so it improve test's readability
---
 .../regress/expected/without_overlaps.out | 76 ++-
 src/test/regress/sql/without_overlaps.sql | 76 ++-
 2 files changed, 82 insertions(+), 70 deletions(-)

diff --git a/src/test/regress/expected/without_overlaps.out b/src/test/regress/expected/without_overlaps.out
index c633738c..2d0f5e21 100644
--- a/src/test/regress/expected/without_overlaps.out
+++ b/src/test/regress/expected/without_overlaps.out
@@ -421,53 +421,58 @@ DROP TABLE temporal3;
 --
 -- test FOREIGN KEY, range references range
 --
+--test table setup.
+DROP TABLE IF EXISTS temporal_rng;
+CREATE TABLE temporal_rng (id int4range, valid_at tsrange);
+ALTER TABLE temporal_rng
+	ADD CONSTRAINT temporal_rng_pk
+	PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);
 -- Can't create a FK with a mismatched range type
 CREATE TABLE temporal_fk_rng2rng (
 	id int4range,
 	valid_at int4range,
 	parent_id int4range,
-	CONSTRAINT temporal_fk_rng2rng_pk2 PRIMARY KEY (id, valid_at WITHOUT OVERLAPS),
 	CONSTRAINT temporal_fk_rng2rng_fk2 FOREIGN KEY (parent_id, PERIOD valid_at)
 		REFERENCES temporal_rng (id, PERIOD valid_at)
 );
 ERROR:  foreign key constraint "temporal_fk_rng2rng_fk2" cannot be implemented
 DETAIL:  Key 

Re: More new SQL/JSON item methods

2024-02-01 Thread Jeevan Chalke
On Thu, Feb 1, 2024 at 11:25 AM Kyotaro Horiguchi 
wrote:

> At Thu, 1 Feb 2024 09:22:22 +0530, Jeevan Chalke <
> jeevan.cha...@enterprisedb.com> wrote in
> > On Thu, Feb 1, 2024 at 7:24 AM Kyotaro Horiguchi <
> horikyota@gmail.com>
> > wrote:
> >
> > > At Thu, 01 Feb 2024 10:49:57 +0900 (JST), Kyotaro Horiguchi <
> > > horikyota@gmail.com> wrote in
> > > > By the way, while playing with this feature, I noticed the following
> > > > error message:
> > > >
> > > > > select jsonb_path_query('1.1' , '$.boolean()');
> > > > > ERROR:  numeric argument of jsonpath item method .boolean() is out
> of
> > > range for type boolean
> > > >
> > > > The error message seems a bit off to me. For example, "argument '1.1'
> > > > is invalid for type [bB]oolean" seems more appropriate for this
> > > > specific issue. (I'm not ceratin about our policy on the spelling of
> > > > Boolean..)
> > >
> > > Or, following our general convention, it would be spelled as:
> > >
> > > 'invalid argument for type Boolean: "1.1"'
> > >
> >
> > jsonpath way:
>
> Hmm. I see.
>
> > ERROR:  argument of jsonpath item method .boolean() is invalid for type
> > boolean
> >
> > or, if we add input value, then
> >
> > ERROR:  argument "1.1" of jsonpath item method .boolean() is invalid for
> > type boolean
> >
> > And this should work for all the error types, like out of range, not
> valid,
> > invalid input, etc, etc. Also, we don't need separate error messages for
> > string input as well, which currently has the following form:
> >
> > "string argument of jsonpath item method .%s() is not a valid
> > representation.."
>
> Agreed.
>

Attached are patches based on the discussion.


>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>


-- 
Jeevan Chalke

*Principal, ManagerProduct Development*



edbpostgres.com


v1-0002-Improve-error-message-for-NaN-or-Infinity-inputs-.patch
Description: Binary data


v1-0001-Merge-error-messages-in-the-same-pattern-in-jsonp.patch
Description: Binary data


v1-0003-Unify-error-messages-for-various-jsonpath-item-me.patch
Description: Binary data


v1-0004-Show-input-value-in-the-error-message-of-various-.patch
Description: Binary data


Re: Commitfest 2024-01 is now closed

2024-02-01 Thread Tom Lane
vignesh C  writes:
> Thanks a lot to all the members who participated in the commitfest.

Thanks for all the work you did running it!  CFM is typically a
thankless exercise in being a nag, but I thought you put in more
than the usual amount of effort to keep things moving.

regards, tom lane




Re: Synchronizing slots from primary to standby

2024-02-01 Thread Bertrand Drouvot
Hi,

On Thu, Feb 01, 2024 at 04:12:43PM +0530, Amit Kapila wrote:
> On Thu, Jan 25, 2024 at 11:26 AM Bertrand Drouvot
>  wrote:
> >
> > On Wed, Jan 24, 2024 at 04:09:15PM +0530, shveta malik wrote:
> > > On Wed, Jan 24, 2024 at 2:38 PM Bertrand Drouvot
> > >  wrote:
> > > >
> > > > I also see Sawada-San's point and I'd vote for 
> > > > "sync_replication_slots". Then for
> > > > the current feature I think "failover" and "on" should be the values to 
> > > > turn the
> > > > feature on (assuming "on" would mean "all kind of supported slots").
> > >
> > > Even if others agree and we change this GUC name to
> > > "sync_replication_slots", I feel we should keep the values as "on" and
> > > "off" currently, where "on" would mean 'sync failover slots' (docs can
> > > state that clearly).
> >
> > I gave more thoughts on it and I think the values should only be "failover" 
> > or
> > "off".
> >
> > The reason is that if we allow "on" and change the "on" behavior in future
> > versions (to support more than failover slots) then that would change the 
> > behavior
> > for the ones that used "on".
> >
> 
> I again thought on this point and feel that even if we start to sync
> say physical slots their purpose would also be to allow
> failover/switchover, otherwise, there is no use of syncing the slots.

Yeah, I think this is a good point.

> So, by that theory, we can just go for naming it as
> sync_failover_slots or simply sync_slots with values 'off' and 'on'.
> Now, if these are used for switchover then there is an argument that
> adding 'failover' in the GUC name could be confusing but I feel
> 'failover' is used widely enough that it shouldn't be a problem for
> users to understand, otherwise, we can go with simple name like
> sync_slots as well.
> 

I agree and "on"/"off" looks enough to me now. As far the GUC name I've the
feeling that "replication" should be part of it, and think that 
sync_replication_slots
is fine. The reason behind is that "sync_slots" could be confusing if in the 
future other kind of "slot" (other than replication ones) are added in the 
engine.

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Commitfest 2024-01 is now closed

2024-02-01 Thread vignesh C
Hi,

Thanks a lot to all the members who participated in the commitfest.

Here are the final numbers at the end of the commitfest:
status |   w1  |  w2  |   w3   |   w4|   End
++--++-+--
Needs review:   |   238  | 213  |  181|  146   |   N/A
Waiting on Author | 44  |46  |   52|65   |   N/A
Ready for Committer| 27  |   27   |   26|24   |   N/A
Committed:   | 36  |   46   |   57|71   |75
Moved to next CF |   1  | 3   | 4|  4   |   208
Withdrawn|   2  | 4   |   12|14   | 16
RWF |   3  |   12   |   18|26  |  51
Rejected   |   1  | 1   | 2|  2   |   2
Total  |   352 |  352   |  352   |  352  |352

Committed patches comparison with previous January commitfests:
2024: 75 committed
2023: 70 committed
2022: 58 committed
2021: 56 committed
2020: 49 committed

 A special thanks to the reviewers/committers who spent tireless
effort in moving the patches forward.

Regards,
Vignesh




Re: Improve eviction algorithm in ReorderBuffer

2024-02-01 Thread Shubham Khanna
On Fri, Jan 26, 2024 at 2:07 PM Masahiko Sawada  wrote:
>
> On Wed, Dec 20, 2023 at 12:11 PM Amit Kapila  wrote:
> >
> > On Wed, Dec 20, 2023 at 6:49 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Dec 19, 2023 at 8:02 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Tue, Dec 19, 2023 at 8:31 AM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > On Sun, Dec 17, 2023 at 11:40 AM Amit Kapila 
> > > > >  wrote:
> > > > > >
> > > > > >
> > > > > > The individual transactions shouldn't cross
> > > > > > 'logical_decoding_work_mem'. I got a bit confused by your proposal 
> > > > > > to
> > > > > > maintain the lists: "...splitting it into two lists: transactions
> > > > > > consuming 5% < and 5% >=  of the memory limit, and checking the 5% 
> > > > > > >=
> > > > > > list preferably.". In the previous sentence, what did you mean by
> > > > > > transactions consuming 5% >= of the memory limit? I got the 
> > > > > > impression
> > > > > > that you are saying to maintain them in a separate transaction list
> > > > > > which doesn't seems to be the case.
> > > > >
> > > > > I wanted to mean that there are three lists in total: the first one
> > > > > maintain the transactions consuming more than 10% of
> > > > > logical_decoding_work_mem,
> > > > >
> > > >
> > > > How can we have multiple transactions in the list consuming more than
> > > > 10% of logical_decoding_work_mem? Shouldn't we perform serialization
> > > > before any xact reaches logical_decoding_work_mem?
> > >
> > > Well, suppose logical_decoding_work_mem is set to 64MB, transactions
> > > consuming more than 6.4MB are added to the list. So for example, it's
> > > possible that the list has three transactions each of which are
> > > consuming 10MB while the total memory usage in the reorderbuffer is
> > > still 30MB (less than logical_decoding_work_mem).
> > >
> >
> > Thanks for the clarification. I misunderstood the list to have
> > transactions greater than 70.4 MB (64 + 6.4) in your example. But one
> > thing to note is that maintaining these lists by default can also have
> > some overhead unless the list of open transactions crosses a certain
> > threshold.
> >
>
> On further analysis, I realized that the approach discussed here might
> not be the way to go. The idea of dividing transactions into several
> subgroups is to divide a large number of entries into multiple
> sub-groups so we can reduce the complexity to search for the
> particular entry. Since we assume that there are no big differences in
> entries' sizes within a sub-group, we can pick the entry to evict in
> O(1). However, what we really need to avoid here is that we end up
> increasing the number of times to evict entries because serializing an
> entry to the disk is more costly than searching an entry on memory in
> general.
>
> I think that it's no problem in a large-entries subgroup but when it
> comes to the smallest-entries subgroup, like for entries consuming
> less than 5% of the limit, it could end up evicting many entries. For
> example, there would be a huge difference between serializing 1 entry
> consuming 5% of the memory limit and serializing 5000 entries
> consuming 0.001% of the memory limit. Even if we can select 5000
> entries quickly, I think the latter would be slower in total. The more
> subgroups we create, the more the algorithm gets complex and the
> overheads could cause. So I think we need to search for the largest
> entry in order to minimize the number of evictions anyway.
>
> Looking for data structures and algorithms, I think binaryheap with
> some improvements could be promising. I mentioned before why we cannot
> use the current binaryheap[1]. The missing pieces are efficient ways
> to remove the arbitrary entry and to update the arbitrary entry's key.
> The current binaryheap provides binaryheap_remove_node(), which is
> O(log n), but it requires the entry's position in the binaryheap. We
> can know the entry's position just after binaryheap_add_unordered()
> but it might be changed after heapify. Searching the node's position
> is O(n). So the improvement idea is to add a hash table to the
> binaryheap so that it can track the positions for each entry so that
> we can remove the arbitrary entry in O(log n) and also update the
> arbitrary entry's key in O(log n). This is known as the indexed
> priority queue. I've attached the patch for that (0001 and 0002).
>
> That way, in terms of reorderbuffer, we can update and remove the
> transaction's memory usage in O(log n) (in worst case and O(1) in
> average) and then pick the largest transaction in O(1). Since we might
> need to call ReorderBufferSerializeTXN() even in non-streaming case,
> we need to maintain the binaryheap anyway. I've attached the patch for
> that (0003).
>
> Here are test script for many sub-transactions case:
>
> create table test (c int);
> create or replace function testfn (cnt int) returns void as $$
> begin
>   for i in 1..cnt loop
> begin
>   insert into test 

Re: Synchronizing slots from primary to standby

2024-02-01 Thread Amit Kapila
On Fri, Feb 2, 2024 at 6:46 AM Masahiko Sawada  wrote:
>
> On Thu, Feb 1, 2024 at 12:51 PM Amit Kapila  wrote:
> >
> >
> > > BTW I've tested the following switch/fail-back scenario but it seems
> > > not to work fine. Am I missing something?
> > >
> > > Setup:
> > > node1 is the primary, node2 is the physical standby for node1, and
> > > node3 is the subscriber connecting to node1.
> > >
> > > Steps:
> > > 1. [node1]: create a table and a publication for the table.
> > > 2. [node2]: set enable_syncslot = on and start (to receive WALs from 
> > > node1).
> > > 3. [node3]: create a subscription with failover = true for the 
> > > publication.
> > > 4. [node2]: promote to the new standby.
> > > 5. [node3]: alter subscription to connect the new primary, node2.
> > > 6. [node1]: stop, set enable_syncslot = on (and other required
> > > parameters), then start as a new standby.
> > >
> > > Then I got the error "exiting from slot synchronization because same
> > > name slot "test_sub" already exists on the standby".
> > >
> > > The logical replication slot that was created on the old primary
> > > (node1) has been synchronized to the old standby (node2). Therefore on
> > > node2, the slot's "synced" field is true. However, once node1 starts
> > > as the new standby with slot synchronization, the slotsync worker
> > > cannot synchronize the slot because the slot's "synced" field on the
> > > primary is false.
> > >
> >
> > Yeah, we avoided doing anything in this case because the user could
> > have manually created another slot with the same name on standby.
> > Unlike WAL slots can be modified on standby as we allow decoding on
> > standby, so we can't allow to overwrite the existing slots. We won't
> > be able to distinguish whether the existing slot was a slot that the
> > user wants to sync with primary or a slot created on standby to
> > perform decoding. I think in this case user first needs to drop the
> > slot on new standby.
>
> Yes, but if we do a switch-back further (i.e. in above case, node1
> backs to the primary again and node becomes the standby again), the
> user doesn't need to remove failover slots since they are already
> marked as "synced".

But, I think in this case node-2's timeline will be ahead of node-1,
so will we be able to make node-2 follow node-1 again without any
additional steps? One thing is not clear to me after promotion the
timeline changes in WAL, so the locations in slots will be as per new
timelines, after that will it be safe to sync slots from the new
primary to old-primary?

In general, I think after failover, we recommend running pg_rewind if
the old primary has to follow the new primary to account for
divergence in WAL. So, not sure we can safely start syncing slots in
old-primary from new-primary, consider that in the new primary, the
same name slot may have dropped/re-created multiple times. We can
probably reset all the fields of the existing slot the first time
syncing for an existing slot or do something like that but I think it
would be better to just re-create the slot.

>
 I wonder if we could do something automatically to
> reduce the user's operation.

One possibility is that we forcefully drop/re-create the slot or
directly overwrite the slot contents but that would probably be better
done via some GUC or slot-level parameter. I feel we should leave this
for another day, for the first version, we can document that an error
will occur if the same name slots on standby exist, so users need to
ensure that there shouldn't be an existing same name slots on standby
before sync.

-- 
With Regards,
Amit Kapila.




Re: POC: GROUP BY optimization

2024-02-01 Thread Andrei Lepikhov

On 2/2/2024 11:06, Richard Guo wrote:


On Fri, Feb 2, 2024 at 11:32 AM Richard Guo > wrote:


On Fri, Feb 2, 2024 at 10:02 AM Tom Lane mailto:t...@sss.pgh.pa.us>> wrote:

One of the test cases added by this commit has not been very
stable in the buildfarm.  Latest example is here:


https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04
 


and I've seen similar failures intermittently on other machines.

I'd suggest building this test atop a table that is more stable
than pg_class.  You're just waving a red flag in front of a bull
if you expect stable statistics from that during a regression run.
Nor do I see any particular reason for pg_class to be especially
suited to the test.


Yeah, it's not a good practice to use pg_class in this place.  While
looking through the test cases added by this commit, I noticed some
other minor issues that are not great.  Such as

* The table 'btg' is inserted with 1 tuples, which seems a bit
expensive for a test.  I don't think we need such a big table to test
what we want.

* I don't see why we need to manipulate GUC max_parallel_workers and
max_parallel_workers_per_gather.

* I think we'd better write the tests with the keywords being all upper
or all lower.  A mixed use of upper and lower is not great. Such as in

     explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;

* Some comments for the test queries are not easy to read.

* For this statement

     CREATE INDEX idx_y_x_z ON btg(y,x,w);

I think the index name would cause confusion.  It creates an index on
columns y, x and w, but the name indicates an index on y, x and z.

I'd like to write a draft patch for the fixes.


Here is the draft patch that fixes the issues I complained about in
upthread.
I passed through the patch. Looks like it doesn't break anything. Why do 
you prefer to use count(*) in EXPLAIN instead of plain targetlist, like 
"SELECT x,y,..."?

Also, according to the test mentioned by Tom:
1. I see, PG uses IndexScan on (x,y). So, column x will be already 
sorted before the MergeJoin. Why not use Incremental Sort on (x,z,w) 
instead of full sort?
2. For memo, IMO, this test shows us the future near perspective of this 
feature: It is cheaper to use grouping order (w,z) instead of (z,w).


--
regards,
Andrei Lepikhov
Postgres Professional





Re: Synchronizing slots from primary to standby

2024-02-01 Thread Peter Smith
Here are some review comments for v750001.

==
Commit message

1.
This patch provides support for non-replication connection
in libpqrcv_connect().

~

1a.
/connection/connections/

~

1b.
Maybe there needs to be a few more sentences just to describe what you
mean by "non-replication connection".

~

1c.
IIUC although the 'replication' parameter is added, in this patch
AFAICT every call to the connect function is still passing that
argument as true. If that's correct, probably this patch comment
should emphasise that this patch doesn't change any functionality at
all but is just preparation for later patches which *will* pass false
for the replication arg.

~~~

2.
This patch also implements a new API libpqrcv_get_dbname_from_conninfo()
to extract database name from the given connection-info

~

/extract database name/the extract database name/

==
.../libpqwalreceiver/libpqwalreceiver.c

3.
+ * Apart from walreceiver, the libpq-specific routines here are now being used
+ * by logical replication worker as well.

/worker/workers/

~~~

4. libpqrcv_connect

 /*
- * Establish the connection to the primary server for XLOG streaming
+ * Establish the connection to the primary server.
+ *
+ * The connection established could be either a replication one or
+ * a non-replication one based on input argument 'replication'. And further
+ * if it is a replication connection, it could be either logical or physical
+ * based on input argument 'logical'.

That first comment ("could be either a replication one or...") seemed
a bit meaningless (e.g. it like saying "this boolean argument can be
true or false") because it doesn't describe what is the meaning of a
"replication connection" versus what is a "non-replication
connection".

~~~

5.
/* We can not have logical without replication */
Assert(replication || !logical);

if (replication)
{
keys[++i] = "replication";
vals[i] = logical ? "database" : "true";

if (!logical)
{
/*
* The database name is ignored by the server in replication mode,
* but specify "replication" for .pgpass lookup.
*/
keys[++i] = "dbname";
vals[i] = "replication";
}
}

keys[++i] = "fallback_application_name";
vals[i] = appname;
if (logical)
{
...
}

~

The Assert already says we cannot be 'logical' if not 'replication',
therefore IMO it seemed strange that the code was not refactored to
bring that 2nd "if (logical)" code to within the scope of the "if
(replication)".

e.g. Can't you do something like this:

Assert(replication || !logical);

if (replication)
{
  ...
  if (logical)
  {
...
  }
  else
  {
...
  }
}
keys[++i] = "fallback_application_name";
vals[i] = appname;

~~~

6. libpqrcv_get_dbname_from_conninfo

+ for (PQconninfoOption *opt = opts; opt->keyword != NULL; ++opt)
+ {
+ /*
+ * If multiple dbnames are specified, then the last one will be
+ * returned
+ */
+ if (strcmp(opt->keyword, "dbname") == 0 && opt->val &&
+ opt->val[0] != '\0')
+ dbname = pstrdup(opt->val);
+ }

Should you also pfree the old dbname instead of gathering a bunch of
strdups if there happened to be multiple dbnames specified ?

SUGGESTION
if (strcmp(opt->keyword, "dbname") == 0 && opt->val && *opt->val)
{
   if (dbname)
pfree(dbname);
   dbname = pstrdup(opt->val);
}

==
src/include/replication/walreceiver.h

7.
/*
 * walrcv_connect_fn
 *
 * Establish connection to a cluster.  'logical' is true if the
 * connection is logical, and false if the connection is physical.
 * 'appname' is a name associated to the connection, to use for example
 * with fallback_application_name or application_name.  Returns the
 * details about the connection established, as defined by
 * WalReceiverConn for each WAL receiver module.  On error, NULL is
 * returned with 'err' including the error generated.
 */
typedef WalReceiverConn *(*walrcv_connect_fn) (const char *conninfo,
   bool replication,
   bool logical,
   bool must_use_password,
   const char *appname,
   char **err);

~

The comment is missing any description of the new parameter 'replication'.

~~~

8.
+/*
+ * walrcv_get_dbname_from_conninfo_fn
+ *
+ * Returns the dbid from the primary_conninfo
+ */
+typedef char *(*walrcv_get_dbname_from_conninfo_fn) (const char *conninfo);
+

/dbid/database name/

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: POC: GROUP BY optimization

2024-02-01 Thread Richard Guo
On Fri, Feb 2, 2024 at 11:32 AM Richard Guo  wrote:

> On Fri, Feb 2, 2024 at 10:02 AM Tom Lane  wrote:
>
>> One of the test cases added by this commit has not been very
>> stable in the buildfarm.  Latest example is here:
>>
>>
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04
>>
>> and I've seen similar failures intermittently on other machines.
>>
>> I'd suggest building this test atop a table that is more stable
>> than pg_class.  You're just waving a red flag in front of a bull
>> if you expect stable statistics from that during a regression run.
>> Nor do I see any particular reason for pg_class to be especially
>> suited to the test.
>
>
> Yeah, it's not a good practice to use pg_class in this place.  While
> looking through the test cases added by this commit, I noticed some
> other minor issues that are not great.  Such as
>
> * The table 'btg' is inserted with 1 tuples, which seems a bit
> expensive for a test.  I don't think we need such a big table to test
> what we want.
>
> * I don't see why we need to manipulate GUC max_parallel_workers and
> max_parallel_workers_per_gather.
>
> * I think we'd better write the tests with the keywords being all upper
> or all lower.  A mixed use of upper and lower is not great. Such as in
>
> explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
>
> * Some comments for the test queries are not easy to read.
>
> * For this statement
>
> CREATE INDEX idx_y_x_z ON btg(y,x,w);
>
> I think the index name would cause confusion.  It creates an index on
> columns y, x and w, but the name indicates an index on y, x and z.
>
> I'd like to write a draft patch for the fixes.
>

Here is the draft patch that fixes the issues I complained about in
upthread.

Thanks
Richard


v1-0001-Multiple-revises-for-the-GROUP-BY-reordering-tests.patch
Description: Binary data


Re: POC: GROUP BY optimization

2024-02-01 Thread Andrei Lepikhov

On 2/2/2024 09:02, Tom Lane wrote:

Alexander Korotkov  writes:

I'm going to push this if there are no objections.


One of the test cases added by this commit has not been very
stable in the buildfarm.  Latest example is here:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04

and I've seen similar failures intermittently on other machines.

I'd suggest building this test atop a table that is more stable
than pg_class.  You're just waving a red flag in front of a bull
if you expect stable statistics from that during a regression run.
Nor do I see any particular reason for pg_class to be especially
suited to the test.

Yeah, It is my fault. Please, see in the attachment the patch fixing that.
--
regards,
Andrei Lepikhov
Postgres Professional
From 11a049d95ee48e38ad569aab7663d8de91f946ad Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 2 Feb 2024 10:39:55 +0700
Subject: [PATCH] Replace the GROUP-BY optimization test with the same based on
 something less volatile when the pg_class relation.

---
 src/test/regress/expected/aggregates.out | 32 +++-
 src/test/regress/sql/aggregates.sql  |  9 +++
 2 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/src/test/regress/expected/aggregates.out 
b/src/test/regress/expected/aggregates.out
index 7a73c19314..c2e1b8c9ed 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2873,7 +2873,6 @@ SELECT y,x,array_agg(distinct w) FROM btg WHERE y < 0 
GROUP BY x,y;
 (6 rows)
 
 RESET enable_incremental_sort;
-DROP TABLE btg;
 -- The case, when scanning sort order correspond to aggregate sort order but
 -- can not be found in the group-by list
 CREATE TABLE agg_sort_order (c1 int PRIMARY KEY, c2 int);
@@ -2901,32 +2900,31 @@ DROP TABLE agg_sort_order CASCADE;
 SET enable_hashjoin = off;
 SET enable_nestloop = off;
 explain (COSTS OFF)
-SELECT c1.relname,c1.relpages
-FROM pg_class c1 JOIN pg_class c2 ON (c1.relname=c2.relname AND 
c1.relpages=c2.relpages)
-GROUP BY c1.reltuples,c1.relpages,c1.relname
-ORDER BY c1.relpages, c1.relname, c1.relpages*c1.relpages;
- QUERY PLAN
  
--
+SELECT b1.x,b1.w FROM btg b1 JOIN btg b2 ON (b1.z=b2.z AND b1.w=b2.w)
+GROUP BY b1.x,b1.z,b1.w ORDER BY b1.z, b1.w, b1.x*b1.x;
+QUERY PLAN 
+---
  Incremental Sort
-   Sort Key: c1.relpages, c1.relname, ((c1.relpages * c1.relpages))
-   Presorted Key: c1.relpages, c1.relname
+   Sort Key: b1.z, b1.w, ((b1.x * b1.x))
+   Presorted Key: b1.z, b1.w
->  Group
- Group Key: c1.relpages, c1.relname, c1.reltuples
+ Group Key: b1.z, b1.w, b1.x
  ->  Incremental Sort
-   Sort Key: c1.relpages, c1.relname, c1.reltuples
-   Presorted Key: c1.relpages, c1.relname
+   Sort Key: b1.z, b1.w, b1.x
+   Presorted Key: b1.z, b1.w
->  Merge Join
- Merge Cond: ((c1.relpages = c2.relpages) AND (c1.relname 
= c2.relname))
+ Merge Cond: ((b1.z = b2.z) AND (b1.w = b2.w))
  ->  Sort
-   Sort Key: c1.relpages, c1.relname
-   ->  Seq Scan on pg_class c1
+   Sort Key: b1.z, b1.w
+   ->  Seq Scan on btg b1
  ->  Sort
-   Sort Key: c2.relpages, c2.relname
-   ->  Seq Scan on pg_class c2
+   Sort Key: b2.z, b2.w
+   ->  Seq Scan on btg b2
 (16 rows)
 
 RESET enable_hashjoin;
 RESET enable_nestloop;
+DROP TABLE btg;
 RESET enable_hashagg;
 RESET max_parallel_workers;
 RESET max_parallel_workers_per_gather;
diff --git a/src/test/regress/sql/aggregates.sql 
b/src/test/regress/sql/aggregates.sql
index 916dbf908f..3548fbb8db 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1229,8 +1229,6 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT y,x,array_agg(distinct w) FROM btg WHERE y < 0 GROUP BY x,y;
 RESET enable_incremental_sort;
 
-DROP TABLE btg;
-
 -- The case, when scanning sort order correspond to aggregate sort order but
 -- can not be found in the group-by list
 CREATE TABLE agg_sort_order (c1 int PRIMARY KEY, c2 int);
@@ -1245,13 +1243,12 @@ DROP TABLE agg_sort_order CASCADE;
 SET enable_hashjoin = off;
 SET enable_nestloop = off;
 explain (COSTS OFF)
-SELECT c1.relname,c1.relpages
-FROM pg_class c1 JOIN pg_class c2 ON (c1.relname=c2.relname AND 
c1.relpages=c2.relpages)
-GROUP BY c1.reltuples,c1.relpages,c1.relname
-ORDER BY c1.relpages, c1.relname, c1.relpages*c1.relpages;
+SELECT b1.x,b1.w FROM btg b1 JOIN 

Re: POC: GROUP BY optimization

2024-02-01 Thread Richard Guo
On Fri, Feb 2, 2024 at 10:02 AM Tom Lane  wrote:

> Alexander Korotkov  writes:
> > I'm going to push this if there are no objections.
>
> One of the test cases added by this commit has not been very
> stable in the buildfarm.  Latest example is here:
>
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04
>
> and I've seen similar failures intermittently on other machines.
>
> I'd suggest building this test atop a table that is more stable
> than pg_class.  You're just waving a red flag in front of a bull
> if you expect stable statistics from that during a regression run.
> Nor do I see any particular reason for pg_class to be especially
> suited to the test.


Yeah, it's not a good practice to use pg_class in this place.  While
looking through the test cases added by this commit, I noticed some
other minor issues that are not great.  Such as

* The table 'btg' is inserted with 1 tuples, which seems a bit
expensive for a test.  I don't think we need such a big table to test
what we want.

* I don't see why we need to manipulate GUC max_parallel_workers and
max_parallel_workers_per_gather.

* I think we'd better write the tests with the keywords being all upper
or all lower.  A mixed use of upper and lower is not great. Such as in

explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;

* Some comments for the test queries are not easy to read.

* For this statement

CREATE INDEX idx_y_x_z ON btg(y,x,w);

I think the index name would cause confusion.  It creates an index on
columns y, x and w, but the name indicates an index on y, x and z.

I'd like to write a draft patch for the fixes.

Thanks
Richard


Re: An improvement on parallel DISTINCT

2024-02-01 Thread David Rowley
On Wed, 27 Dec 2023 at 00:23, Richard Guo  wrote:
> -- on master
> EXPLAIN (costs off)
> SELECT DISTINCT four FROM tenk1;
>  QUERY PLAN
> 
>  Unique
>->  Sort
>  Sort Key: four
>  ->  Gather
>Workers Planned: 2
>->  HashAggregate
>  Group Key: four
>  ->  Parallel Seq Scan on tenk1
> (8 rows)
>
> -- on patched
> EXPLAIN (costs off)
> SELECT DISTINCT four FROM tenk1;
>  QUERY PLAN
> 
>  Unique
>->  Gather Merge
>  Workers Planned: 2
>  ->  Sort
>Sort Key: four
>->  HashAggregate
>  Group Key: four
>  ->  Parallel Seq Scan on tenk1
> (8 rows)
>
> I believe the second plan is better.

I wonder if this change is worthwhile. The sort is only required at
all because the planner opted to HashAggregate in phase1, of which the
rows are output unordered. If phase1 was done by Group Aggregate, then
no sorting would be needed.  The only reason the planner didn't Hash
Aggregate for phase2 is because of the order we generate the distinct
paths and because of STD_FUZZ_FACTOR.

Look at the costs of the above plan:

Unique  (cost=397.24..397.28 rows=4 width=4)

if I enable_sort=0; then I get a cheaper plan:

 HashAggregate  (cost=397.14..397.18 rows=4 width=4)

If we add more rows then the cost of sorting will grow faster than the
cost of hash aggregate due to the O(N log2 N) part of our sort
costing.

If I drop the index on tenk1(hundred), I only need to go to the
"hundred" column to have it switch to Hash Aggregate on the 2nd phase.
This is because the number of distinct groups costs the paths for
Group Aggregate and Hash Aggregate more than STD_FUZZ_FACTOR apart.
Adjusting the STD_FUZZ_FACTOR with the following means Hash Aggregate
is used for both phases.

-#define STD_FUZZ_FACTOR 1.01
+#define STD_FUZZ_FACTOR 1.001

In light of this, do you still think it's worthwhile making this change?

For me, I think all it's going to result in is extra planner work
without any performance gains.

> Attached is a patch that includes this change and also eliminates the
> usage of root->upper_targets[] in the core code.  It also makes some
> tweaks for the comment.

We should fix that.  We can consider it independently from the other
change you're proposing.

David




Re: Small fix on COPY ON_ERROR document

2024-02-01 Thread torikoshia

On 2024-02-01 15:16, Yugo NAGATA wrote:

On Mon, 29 Jan 2024 15:47:25 +0900
Yugo NAGATA  wrote:


On Sun, 28 Jan 2024 19:14:58 -0700
"David G. Johnston"  wrote:

> > Also, I think "invalid input syntax" is a bit ambiguous. For example,
> > COPY FROM raises an error when the number of input column does not match
> > to the table schema, but this error is not ignored by ON_ERROR while
> > this seems to fall into the category of "invalid input syntax".
>
>
>
> It is literally the error text that appears if one were not to ignore it.
> It isn’t a category of errors.  But I’m open to ideas here.  But being
> explicit with what on actually sees in the system seemed preferable to
> inventing new classification terms not otherwise used.

Thank you for explanation! I understood the words was from the error 
messages
that users actually see. However, as Torikoshi-san said in [1], errors 
other
than valid input syntax (e.g. range error) can be also ignored, 
therefore it

would be better to describe to be ignored errors more specifically.

[1] 
https://www.postgresql.org/message-id/7f1457497fa3bf9dfe486f162d1c8ec6%40oss.nttdata.com


>
> >
> > So, keeping consistency with the existing description, we can say:
> >
> > "Specifies which how to behave when encountering an error due to
> >  column values unacceptable to the input function of each attribute's
> >  data type."
>
>
> Yeah, I was considering something along those lines as an option as well.
> But I’d rather add that wording to the glossary.

Although I am still be not convinced if we have to introduce the words
"soft error" to the documentation, I don't care it if there are no 
other

opposite opinions.


Attached is a updated patch v3, which is a version that uses the above
wording instead of "soft error".


>
> > Currently, ON_ERROR doesn't support other soft errors, so it can explain
> > it more simply without introducing the new concept, "soft error" to users.
> >
> >
> Good point.  Seems we should define what user-facing errors are ignored
> anywhere in the system and if we aren’t consistently leveraging these in
> all areas/commands make the necessary qualifications in those specific
> places.
>

> > I think "left in a deleted state" is also unclear for users because this
> > explains the internal state but not how looks from user's view.How about
> > leaving the explanation "These rows will not be visible or accessible" in
> > the existing statement?
> >
>
> Just visible then, I don’t like an “or” there and as tuples at least they
> are accessible to the system, in vacuum especially.  But I expected the
> user to understand “as if you deleted it” as their operational concept more
> readily than visible.  I think this will be read by people who haven’t read
> MVCC to fully understand what visible means but know enough to run vacuum
> to clean up updated and deleted data as a rule.

Ok, I agree we can omit "or accessible". How do you like the 
followings?

Still redundant?

 "If the command fails, these rows are left in a deleted state;
  these rows will not be visible, but they still occupy disk space. "


Also, the above statement is used in the patch.


Thanks for updating the patch!

I like your description which doesn't use the word soft error.


Here are minor comments:

+  ignore means discard the input row and 
continue with the next one.

+  The default is stop


Is "." required at the end of the line?

 An NOTICE level context message containing the 
ignored row count is


Should 'An' be 'A'?

Also, I wasn't sure the necessity of 'context'.
It might be possible to just say "A NOTICE message containing the 
ignored row count.."

considering below existing descriptions:

  doc/src/sgml/pltcl.sgml: a NOTICE message each 
time a supported command is

  doc/src/sgml/pltcl.sgml- executed:

  doc/src/sgml/plpgsql.sgml: This example trigger simply raises a 
NOTICE message
  doc/src/sgml/plpgsql.sgml- each time a supported command is 
executed.


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: Oversight in reparameterize_path_by_child leading to executor crash

2024-02-01 Thread Richard Guo
On Fri, Feb 2, 2024 at 1:45 AM Tom Lane  wrote:

> I pushed v12 (with some cosmetic adjustments) into the back branches,
> since we're getting close to the February release freeze.  We still
> need to deal with the larger fix for HEAD.  Please re-post that
> one so that the cfbot knows which is the patch-of-record.


Thanks for the adjustments and pushing!

Here is the patch for HEAD.  I simply re-posted v10.  Nothing has
changed.

Thanks
Richard


v10-0001-Postpone-reparameterization-of-paths-until-when-creating-plans.patch
Description: Binary data


Re: set_cheapest without checking pathlist

2024-02-01 Thread James Coleman
On Thu, Feb 1, 2024 at 1:36 AM Richard Guo  wrote:
>
>
> On Thu, Feb 1, 2024 at 11:37 AM David Rowley  wrote:
>>
>> On Thu, 1 Feb 2024 at 16:29, Richard Guo  wrote:
>> > On Thu, Feb 1, 2024 at 10:04 AM James Coleman  wrote:
>> >> I don't see any inherent reason why we must always assume that
>> >> gather_grouping_paths will always result in having at least one entry
>> >> in pathlist. If, for example, we've disabled incremental sort and the
>> >> cheapest partial path happens to already be sorted, then I don't
>> >> believe we'll add any paths. And if that happens then set_cheapest
>> >> will error with the message "could not devise a query plan for the
>> >> given query". So I propose we add a similar guard to this call point.
>> >
>> >
>> > I don't believe that would happen.  It seems to me that we should, at
>> > the very least, have a path which is Gather on top of the cheapest
>> > partial path (see generate_gather_paths), as long as the
>> > partially_grouped_rel has partial paths.
>>
>> It will have partial paths because it's nested inside "if
>> (partially_grouped_rel && partially_grouped_rel->partial_pathlist)"
>
>
> Right.  And that leads to the conclusion that gather_grouping_paths will
> always generate at least one path for partially_grouped_rel.  So it
> seems to me that the check added in the patch is not necessary.  But
> maybe we can add an Assert or a comment clarifying that the pathlist of
> partially_grouped_rel cannot be empty here.

Yes, that's true currently. I don't particularly love that we have the
knowledge of that baked in at this level, but it won't break anything
currently.

I'll put this as a patch in the parallelization patch series
referenced earlier since in that series my changes result in
generate_gather_paths() not necessarily always being able to build a
path.

Regards,
James Coleman




Re: speed up a logical replica setup

2024-02-01 Thread Euler Taveira
On Thu, Feb 1, 2024, at 9:47 AM, Hayato Kuroda (Fujitsu) wrote:
> I made fix patches to solve reported issues by Fabrízio.
> 
> * v13-0001: Same as v11-0001 made by Euler.
> * v13-0002: Fixes ERRORs while dropping replication slots [1].
>If you want to see codes which we get agreement, please apply 
> until 0002.
> === experimental patches ===
> * v13-0003: Avoids to use replication connections. The issue [2] was solved 
> on my env.
> * v13-0004: Removes -P option and use primary_conninfo instead.
> * v13-0005: Refactors data structures

Thanks for rebasing the proposed patches. I'm attaching a new patch.

As I said in the previous email [1] I fixed some issues from your previous 
review:

* use pg_fatal() if possible. There are some cases that it couldn't replace
  pg_log_error() + exit(1) because it requires a hint.
* pg_resetwal output. Send standard output to /dev/null to avoid extra message.
* check privileges. Make sure the current role can execute CREATE SUBSCRIPTION
  and pg_replication_origin_advance().
* log directory. Refactor code that setup the log file used as server log.
* run with restricted token (Windows).
* v13-0002. Merged.
* v13-0003. I applied a modified version. I returned only the required
  information for each query.
* group command-line options into a new struct CreateSubscriberOptions. The
  exception is the dry_run option.
* rename functions that obtain system identifier.

WIP

I'm still working on the data structures to group options. I don't like the way
it was grouped in v13-0005. There is too many levels to reach database name.
The setup_subscriber() function requires the 3 data structures.

The documentation update is almost there. I will include the modifications in
the next patch.

Regarding v13-0004, it seems a good UI that's why I wrote a comment about it.
However, it comes with a restriction that requires a similar HBA rule for both
regular and replication connections. Is it an acceptable restriction? We might
paint ourselves into the corner. A reasonable proposal is not to remove this
option. Instead, it should be optional. If it is not provided, primary_conninfo
is used.

[1] 
https://www.postgresql.org/message-id/80ce3f65-7ca3-471e-b1a4-24ac529ff4ea%40app.fastmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 2666b5f660c64d699a0be440c3701c7be00ec309 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 5 Jun 2023 14:39:40 -0400
Subject: [PATCH v14] Creates a new logical replica from a standby server

A new tool called pg_createsubscriber can convert a physical replica into a
logical replica. It runs on the target server and should be able to
connect to the source server (publisher) and the target server
(subscriber).

The conversion requires a few steps. Check if the target data directory
has the same system identifier than the source data directory. Stop the
target server if it is running as a standby server. Create one
replication slot per specified database on the source server. One
additional replication slot is created at the end to get the consistent
LSN (This consistent LSN will be used as (a) a stopping point for the
recovery process and (b) a starting point for the subscriptions). Write
recovery parameters into the target data directory and start the target
server (Wait until the target server is promoted). Create one
publication (FOR ALL TABLES) per specified database on the source
server. Create one subscription per specified database on the target
server (Use replication slot and publication created in a previous step.
Don't enable the subscriptions yet). Sets the replication progress to
the consistent LSN that was got in a previous step. Enable the
subscription for each specified database on the target server.
Stop the target server. Change the system identifier from the target
server.

Depending on your workload and database size, creating a logical replica
couldn't be an option due to resource constraints (WAL backlog should be
available until all table data is synchronized). The initial data copy
and the replication progress tends to be faster on a physical replica.
The purpose of this tool is to speed up a logical replica setup.
---
 doc/src/sgml/ref/allfiles.sgml|1 +
 doc/src/sgml/ref/pg_createsubscriber.sgml |  320 +++
 doc/src/sgml/reference.sgml   |1 +
 src/bin/pg_basebackup/.gitignore  |1 +
 src/bin/pg_basebackup/Makefile|8 +-
 src/bin/pg_basebackup/meson.build |   19 +
 src/bin/pg_basebackup/pg_createsubscriber.c   | 1876 +
 .../t/040_pg_createsubscriber.pl  |   44 +
 .../t/041_pg_createsubscriber_standby.pl  |  139 ++
 src/tools/pgindent/typedefs.list  |2 +
 10 files changed, 2410 insertions(+), 1 deletion(-)
 create mode 100644 doc/src/sgml/ref/pg_createsubscriber.sgml
 create mode 100644 src/bin/pg_basebackup/pg_createsubscriber.c
 create mode 100644 

Re: POC: GROUP BY optimization

2024-02-01 Thread Tom Lane
Alexander Korotkov  writes:
> I'm going to push this if there are no objections.

One of the test cases added by this commit has not been very
stable in the buildfarm.  Latest example is here:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04

and I've seen similar failures intermittently on other machines.

I'd suggest building this test atop a table that is more stable
than pg_class.  You're just waving a red flag in front of a bull
if you expect stable statistics from that during a regression run.
Nor do I see any particular reason for pg_class to be especially
suited to the test.

regards, tom lane




Re: Synchronizing slots from primary to standby

2024-02-01 Thread Masahiko Sawada
On Thu, Feb 1, 2024 at 12:51 PM Amit Kapila  wrote:
>
> On Wed, Jan 31, 2024 at 9:20 PM Masahiko Sawada  wrote:
> >
> > On Wed, Jan 31, 2024 at 7:42 PM Amit Kapila  wrote:
> > >
> > >
> > > Considering my previous where we don't want to restart for a required
> > > parameter change, isn't it better to avoid repeated restart (say when
> > > the user gave an invalid dbname)? BTW, I think this restart interval
> > > is added based on your previous complaint [1].
> >
> > I think it's useful that the slotsync worker restarts immediately when
> > a required parameter is changed but waits to restart when it exits
> > with an error. IIUC the apply worker does so; if it restarts due to a
> > subscription parameter change, it resets the last-start time so that
> > the launcher will restart it without waiting.
> >
>
> Agreed, this idea sounds good to me.
>
> > >
> > > >
> > > > ---
> > > > When I dropped a database on the primary that has a failover slot, I
> > > > got the following logs on the standby:
> > > >
> > > > 2024-01-31 17:25:21.750 JST [1103933] FATAL:  replication slot "s" is
> > > > active for PID 1103935
> > > > 2024-01-31 17:25:21.750 JST [1103933] CONTEXT:  WAL redo at 0/3020D20
> > > > for Database/DROP: dir 1663/16384
> > > > 2024-01-31 17:25:21.751 JST [1103930] LOG:  startup process (PID
> > > > 1103933) exited with exit code 1
> > > >
> > > > It seems that because the slotsync worker created the slot on the
> > > > standby, the slot's active_pid is still valid.
> > > >
> > >
> > > But we release the slot after sync. And we do take a shared lock on
> > > the database to make the startup process wait for slotsync. There is
> > > one gap which is that we don't reset active_pid for temp slots in
> > > ReplicationSlotRelease(), so for temp slots such an error can occur
> > > but OTOH, we immediately make the slot persistent after sync. As per
> > > my understanding, it is only possible to get this error if the initial
> > > sync doesn't happen and the slot remains temporary. Is that your case?
> > > How did reproduce this?
> >
> > I created a failover slot manually on the primary and dropped the
> > database where the failover slot is created. So this would not happen
> > in normal cases.
> >
>
> Right, it won't happen in normal cases (say for walsender). This can
> happen in some cases even without this patch as noted in comments just
> above active_pid check in ReplicationSlotsDropDBSlots(). Now, we need
> to think whether we should just update the comments above active_pid
> check to explain this case or try to engineer some solution for this
> not-so-common case. I guess if we want a solution we need to stop
> slotsync worker temporarily till the drop database WAL is applied or
> something like that.
>
> > BTW I've tested the following switch/fail-back scenario but it seems
> > not to work fine. Am I missing something?
> >
> > Setup:
> > node1 is the primary, node2 is the physical standby for node1, and
> > node3 is the subscriber connecting to node1.
> >
> > Steps:
> > 1. [node1]: create a table and a publication for the table.
> > 2. [node2]: set enable_syncslot = on and start (to receive WALs from node1).
> > 3. [node3]: create a subscription with failover = true for the publication.
> > 4. [node2]: promote to the new standby.
> > 5. [node3]: alter subscription to connect the new primary, node2.
> > 6. [node1]: stop, set enable_syncslot = on (and other required
> > parameters), then start as a new standby.
> >
> > Then I got the error "exiting from slot synchronization because same
> > name slot "test_sub" already exists on the standby".
> >
> > The logical replication slot that was created on the old primary
> > (node1) has been synchronized to the old standby (node2). Therefore on
> > node2, the slot's "synced" field is true. However, once node1 starts
> > as the new standby with slot synchronization, the slotsync worker
> > cannot synchronize the slot because the slot's "synced" field on the
> > primary is false.
> >
>
> Yeah, we avoided doing anything in this case because the user could
> have manually created another slot with the same name on standby.
> Unlike WAL slots can be modified on standby as we allow decoding on
> standby, so we can't allow to overwrite the existing slots. We won't
> be able to distinguish whether the existing slot was a slot that the
> user wants to sync with primary or a slot created on standby to
> perform decoding. I think in this case user first needs to drop the
> slot on new standby.

Yes, but if we do a switch-back further (i.e. in above case, node1
backs to the primary again and node becomes the standby again), the
user doesn't need to remove failover slots since they are already
marked as "synced". I wonder if we could do something automatically to
reduce the user's operation. Also, If we support slot synchronization
feature also on a cascading standby in the future, this operation will
have to be changed.

Regards,

--
Masahiko Sawada
Amazon Web 

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Michael Paquier
On Fri, Feb 02, 2024 at 06:51:02AM +0900, Michael Paquier wrote:
> I am going to try to plug in some rusage() calls in the backend for
> the COPY paths.  I hope that gives more precision about the backend
> activity.  I'll post that with more numbers.

And here they are with log_statement_stats enabled to get rusage() fot
these queries:
 test |  user_s  | system_s | elapsed_s 
--+--+--+---
 head_to_bin_1col | 1.639761 | 0.007998 |  1.647762
 v7_to_bin_1col   | 1.645499 | 0.004003 |  1.649498
 v10_to_bin_1col  | 1.639466 | 0.004008 |  1.643488

 head_to_bin_10col| 7.486369 | 0.056007 |  7.542485
 v7_to_bin_10col  | 7.314341 | 0.039990 |  7.354743
 v10_to_bin_10col | 7.329355 | 0.052007 |  7.381408

 head_to_text_1col| 1.581140 | 0.012000 |  1.593166
 v7_to_text_1col  | 1.615441 | 0.003992 |  1.619446
 v10_to_text_1col | 1.613443 | 0.00 |  1.613454

 head_to_text_10col   | 5.897014 | 0.011990 |  5.909063
 v7_to_text_10col | 5.722872 | 0.016014 |  5.738979
 v10_to_text_10col| 5.762286 | 0.011993 |  5.774265

 head_from_bin_1col   | 1.524038 | 0.02 |  1.544046
 v7_from_bin_1col | 1.551367 | 0.016015 |  1.567408
 v10_from_bin_1col| 1.560087 | 0.016001 |  1.576115

 head_from_bin_10col  | 5.238444 | 0.139993 |  5.378595
 v7_from_bin_10col| 5.170503 | 0.076021 |  5.246588
 v10_from_bin_10col   | 5.106496 | 0.112020 |  5.218565

 head_from_text_1col  | 1.664124 | 0.003998 |  1.668172
 v7_from_text_1col| 1.720616 | 0.007990 |  1.728617
 v10_from_text_1col   | 1.683950 | 0.007990 |  1.692098

 head_from_text_10col | 4.859651 | 0.015996 |  4.875747
 v7_from_text_10col   | 4.775975 | 0.032000 |  4.808051
 v10_from_text_10col  | 4.737512 | 0.028012 |  4.765522
(24 rows)

I'm looking at this table, and what I can see is still a lot of
variance in the tests with tables involving 1 attribute.  However, a
second thing stands out to me here: there is a speedup with the
10-attribute case for all both COPY FROM and COPY TO, and both
formats.  The data posted at [1] is showing me the same trend.  In
short, let's move on with this split refactoring with the per-row
callbacks.  That clearly shows benefits.

[1] https://www.postgresql.org/message-id/zbr6piwuvhdtf...@paquier.xyz
--
Michael


signature.asc
Description: PGP signature


Re: set_cheapest without checking pathlist

2024-02-01 Thread Richard Guo
On Thu, Feb 1, 2024 at 5:03 PM David Rowley  wrote:

> On Thu, 1 Feb 2024 at 19:36, Richard Guo  wrote:
> > maybe we can add an Assert or a comment clarifying that the pathlist of
> > partially_grouped_rel cannot be empty here.
>
> There'd be no need to Assert that as set_cheapest() will raise an
> error when given a rel with an empty pathlist.
>
> I don't think putting an Assert that would fail in the same situation
> that we'd later ERROR out on would be a good idea. It'd make it harder
> to debug the issue.


Fair point.

Thanks
Richard


Re: Fix log_line_prefix to display the transaction id (%x) for statements not in a transaction block

2024-02-01 Thread Quan Zongliang


A little tweak to the code.
GetTopTransactionIdIfAny() != InvalidTransactionId
changed to
TransactionIdIsValid(GetTopTransactionIdIfAny()


On 2024/1/12 08:51, jian he wrote:

Hi

...

with patch:
src3=# explain(analyze, costs off) select 1 from pg_sleep(10);
2024-01-12 08:43:14.750 CST [5739] jian@src3/psql XID:0 LOG:
duration: 10010.167 ms  plan:
 Query Text: explain(analyze, costs off) select 1 from pg_sleep(10);
 Function Scan on pg_sleep  (cost=0.00..0.01 rows=1 width=4)
(actual time=10010.155..10010.159 rows=1 loops=1)
2024-01-12 08:43:14.750 CST [5739] jian@src3/psql XID:0 LOG:
statement: explain(analyze, costs off) select 1 from pg_sleep(10);
  QUERY PLAN
-
  Function Scan on pg_sleep (actual time=10010.155..10010.159 rows=1 loops=1)
  Planning Time: 0.115 ms
  Execution Time: 10010.227 ms
(3 rows)

This problem does exist in a statement that takes a long time to run.
XID is applied only for the first change tuple. If the user want to see 
it in a single statement log, they have to wait until the statement has 
finished executing. And we don't know how long it will take until the 
statement ends. It is not appropriate to output the log twice because of 
xid. Besides, without parsing log_line_prefix we don't know if the user 
wants to see xid.diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 1a34bd3715..bd08b64450 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1064,8 +1064,9 @@ exec_simple_query(const char *query_string)
 */
parsetree_list = pg_parse_query(query_string);
 
-   /* Log immediately if dictated by log_statement */
-   if (check_log_statement(parsetree_list))
+   /* Log immediately if dictated by log_statement and XID assigned. */
+   if (TransactionIdIsValid(GetTopTransactionIdIfAny()) &&
+   check_log_statement(parsetree_list))
{
ereport(LOG,
(errmsg("statement: %s", query_string),
@@ -1282,6 +1283,16 @@ exec_simple_query(const char *query_string)
 
PortalDrop(portal, false);
 
+   /* Log if dictated by log_statement and has not been logged. */
+   if (!was_logged && check_log_statement(parsetree_list))
+   {
+   ereport(LOG,
+   (errmsg("statement: %s", query_string),
+   errhidestmt(true),
+   errdetail_execute(parsetree_list)));
+   was_logged = true;
+   }
+
if (lnext(parsetree_list, parsetree_item) == NULL)
{
/*


Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Sutou Kouhei
Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 2 Feb 2024 06:51:02 +0900,
  Michael Paquier  wrote:

> On Fri, Feb 02, 2024 at 12:19:51AM +0900, Sutou Kouhei wrote:
>> Here are some numbers on my local machine (Note that my
>> local machine isn't suitable for benchmark as I said
>> before. Each number is median of "\watch 15" results):
>>>
>> I'll measure again on my local machine later. I'll stop
>> other processes such as Web browser, editor and so on as
>> much as possible when I do.
> 
> Thanks for compiling some numbers.  This is showing a lot of variance.
> Expecially, these two lines in table 2 are showing surprising results
> for v7:
>   direction format  n_columns master v7v10
>fromcsv  1917.973   1695.401871.991
>from binary  1841.104   1422.012820.786

Here are more numbers:

1:
 direction format  n_columns master v7v10
to   text  1   1053.844978.998956.575
tocsv  1   1091.316   1020.584   1098.314
to binary  1   1034.685969.224980.458
to   text 10   4216.264   3886.515   4111.417
tocsv 10   4649.228   4530.882   4682.988
to binary 10   4219.2284189.99   4211.942
  from   text  1851.697896.968890.458
  fromcsv  1890.229936.231 887.15
  from binary  1784.407 817.07938.736
  from   text 10   2549.056   2233.899   2630.892
  fromcsv 10   2809.441   2868.411   2895.196
  from binary 10   2985.674   3027.522 3397.5

2:
 direction format  n_columns master v7v10
to   text  1   1013.764   1011.968940.855
tocsv  1   1060.431   1065.4681040.68
to binary  1   1013.652   1009.956965.675
to   text 10   4411.484   4031.571   3896.836
tocsv 10   4739.6254715.81   4631.002
to binary 10   4374.077   4357.942   4227.215
  from   text  1955.078922.346866.222
  fromcsv  1   1040.717986.524905.657
  from binary  1849.316864.859833.152
  from   text 10   2703.209   2361.651   2533.992
  fromcsv 102990.35   3059.167   2930.632
  from binary 10   3008.375   3368.714   3055.723

3:
 direction format  n_columns master v7v10
to   text  1   1084.756   1003.822994.409
tocsv  1 1092.4   1062.536   1079.027
to binary  1   1046.774994.168993.633
to   text 104363.51   3978.205   4124.359
tocsv 10   4866.762   4616.001   4715.052
to binary 10   4382.412   4363.269   4213.456
  from   text  1852.976907.315860.749
  fromcsv  1925.187962.632897.833
  from binary  1824.997897.046828.231
  from   text 102591.07   2358.541   2540.431
  fromcsv 10   2907.033   3018.486   2915.997
  from binary 10   3069.0273209.21   3119.128

Other processes are stopped while I measure them. But I'm
not sure these numbers are more reliable than before...

> I am going to try to plug in some rusage() calls in the backend for
> the COPY paths.  I hope that gives more precision about the backend
> activity.  I'll post that with more numbers.

Thanks. It'll help us.


-- 
kou




Re: Where can I find the doxyfile?

2024-02-01 Thread John Morris
Here is the updated patch. It should fix the meson issue when no doxygen is 
present.


doxygen-v3.patch
Description: doxygen-v3.patch


Checking MINIMUM_VERSION_FOR_WAL_SUMMARIES

2024-02-01 Thread Artur Zakirov
Hi hackers,

during reading the source code of new incremental backup functionality
I noticed that the following condition can by unintentional:

/*
 * For newer server versions, likewise create pg_wal/summaries
 */
if (PQserverVersion(conn) < MINIMUM_VERSION_FOR_WAL_SUMMARIES)
{
...

if (pg_mkdir_p(summarydir, pg_dir_create_mode) != 0 &&
errno != EEXIST)
pg_fatal("could not create directory \"%s\": %m", summarydir);
}

This is from src/bin/pg_basebackup/pg_basebackup.c.

Is the condition correct? Shouldn't it be ">=". Otherwise the function
will create "/summaries" only for older PostgreSQL versions.

I've attached a patch to fix it in case this is a typo.

-- 
Artur
From 24227fdcad1fbdb67e38537bc1d70f65d9bdab05 Mon Sep 17 00:00:00 2001
From: Artur Zakirov 
Date: Fri, 2 Feb 2024 01:04:27 +0100
Subject: [PATCH v1] Fix checking MINIMUM_VERSION_FOR_WAL_SUMMARIES for
 creating pg_wal/summaries directory

---
 src/bin/pg_basebackup/pg_basebackup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 77489af518..62eba4a962 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -700,7 +700,7 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier,
 		/*
 		 * For newer server versions, likewise create pg_wal/summaries
 		 */
-		if (PQserverVersion(conn) < MINIMUM_VERSION_FOR_WAL_SUMMARIES)
+		if (PQserverVersion(conn) >= MINIMUM_VERSION_FOR_WAL_SUMMARIES)
 		{
 			char		summarydir[MAXPGPATH];
 
-- 
2.40.1



[Doc] Improve hostssl related descriptions and option presentation

2024-02-01 Thread David G. Johnston
Motivated by a recent complaint [1] I found the hostssl related material in
our docs quite verbose and even repetitive.  Some of that is normal since
we have both an overview/walk-through section as well as a reference
section.  But the overview in particular was self-repetitive.  Here is a
first pass at some improvements.  The commit message contains both the
intent of the patch and some thoughts/questions still being considered.

David J.

[1]
https://www.postgresql.org/message-id/170672433664.663.11895343120533141715%40wrigleys.postgresql.org
From 5bd53027d5f1cbe9021c9b4f7abe3be5792589dd Mon Sep 17 00:00:00 2001
From: "David G. Johnston" 
Date: Thu, 1 Feb 2024 15:24:30 -0700
Subject: [PATCH] docs: improve hostssl related descriptions and option
 presentation

runtime.sgml discussion regarding SSL was repetitive with itself
and presented in a problematic order - with discussion about
setting up the server configuration happening before a broad
overview of what all of the moving parts are.  Put describing
the two approaches first, with links to the reference material,
and then discuss the implementation detail of a trusted root
certificate.

client-auth.sgml treatment of hostssl ssl auth options and
the cert auth method is confusing.  They are related and so
discuss them once, together.  The cert method page provides
a nice place to isolate these, in addition to a largely
repeated discussion on the runtime.sgml page which points
back to this one spot for the extra layer of details for,
in particular, the user mapping.

NOTES:

I left the "comparison is done with the DN in RFC"
paragraph content as-is though I think its presence or
content needs re-evaluation.

I moved the wording regarding "trust+clientcert" to
the runtime page as that material fits better in
overview than reference.  I also changed it from
words to an actual pg_hba.conf line; which seems
a bit out of place but going with it for now.

The method pages are all children of the pg_hba.conf
page so if you land on cert from the runtime page
going from their directly to pg_hba.conf, or
realizing that you are basically in a section
that only pertains to pg_hba.conf, is not that
obvious.  Seems like these pages should have
xrefs added to them giving that context and
linking back to pg_hba.conf directly.
---
 doc/src/sgml/client-auth.sgml | 106 ++
 doc/src/sgml/runtime.sgml |  87 
 2 files changed, 92 insertions(+), 101 deletions(-)

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 56747c0e36..638c8e7057 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -621,8 +621,9 @@ include_dir directory
 cert
 
  
-  Authenticate using SSL client certificates. See
-   for details.
+  Authenticate the hostssl connection attempt using only the SSL certificate.
+  See  for details regarding the auth-options
+  any hostssl connection line can specify.
  
 

@@ -660,45 +661,15 @@ include_dir directory
After the auth-method field, there can be field(s) of
the form name=value that
specify options for the authentication method. Details about which
-   options are available for which authentication methods appear below.
+   options are available for which authentication methods appear on their
+   individual detail pages.
   
 
   
-   In addition to the method-specific options listed below, there is a
-   method-independent authentication option clientcert, which
-   can be specified in any hostssl record.
-   This option can be set to verify-ca or
-   verify-full. Both options require the client
-   to present a valid (trusted) SSL certificate, while
-   verify-full additionally enforces that the
-   cn (Common Name) in the certificate matches
-   the username or an applicable mapping.
-   This behavior is similar to the cert authentication
-   method (see ) but enables pairing
-   the verification of client certificates with any authentication
-   method that supports hostssl entries.
-  
-  
-   On any record using client certificate authentication (i.e. one
-   using the cert authentication method or one
-   using the clientcert option), you can specify
-   which part of the client certificate credentials to match using
-   the clientname option. This option can have one
-   of two values. If you specify clientname=CN, which
-   is the default, the username is matched against the certificate's
-   Common Name (CN). If instead you specify
-   clientname=DN the username is matched against the
-   entire Distinguished Name (DN) of the certificate.
-   This option is probably best used in conjunction with a username map.
-   The comparison is done with the DN in
-   

Re: Adding comments to help understand psql hidden queries

2024-02-01 Thread David Christensen
On Thu, Feb 1, 2024 at 4:34 PM Greg Sabino Mullane  wrote:
>
> The use of the --echo-hidden flag in psql is used to show people the way psql 
> performs its magic for its backslash commands. None of them has more magic 
> than "\d relation", but it suffers from needing a lot of separate queries to 
> gather all of the information it needs. Unfortunately, those queries can get 
> overwhelming and hard to figure out which one does what, especially for those 
> not already very familiar with the system catalogs. Attached is a patch to 
> add a small SQL comment to the top of each SELECT query inside 
> describeOneTableDetail. All other functions use a single query, and thus need 
> no additional context. But "\d mytable" has the potential to run over a dozen 
> SQL queries! The new format looks like this:
>
> / QUERY */
> /* Get information about row-level policies */
> SELECT pol.polname, pol.polpermissive,
>   CASE WHEN pol.polroles = '{0}' THEN NULL ELSE 
> pg_catalog.array_to_string(array(select rolname from pg_catalog.pg_roles 
> where oid = any (pol.polroles) order by 1),',') END,
>   pg_catalog.pg_get_expr(pol.polqual, pol.polrelid),
>   pg_catalog.pg_get_expr(pol.polwithcheck, pol.polrelid),
>   CASE pol.polcmd
> WHEN 'r' THEN 'SELECT'
> WHEN 'a' THEN 'INSERT'
> WHEN 'w' THEN 'UPDATE'
> WHEN 'd' THEN 'DELETE'
> END AS cmd
> FROM pg_catalog.pg_policy pol
> WHERE pol.polrelid = '134384' ORDER BY 1;
> //
>
> Cheers,
> Greg

Thanks, this looks like some helpful information. In the same vein,
I'm including a patch which adds information about the command that
generates the given query as well (atop your commit).  This will
modify the query line to include the command itself:

/ QUERY (\dRs) */

Best,

David


0001-Add-output-of-the-command-that-got-us-here-to-the-QU.patch
Description: Binary data


Re: Call pqPipelineFlush from PQsendFlushRequest

2024-02-01 Thread Jelte Fennema-Nio
On Thu, 1 Feb 2024 at 21:00, Tom Lane  wrote:
>Note that the request is not itself flushed to the server 
> automatically;
>use PQflush if necessary.
>
> Doesn't that require some rewording?

I agree that the current wording is slightly incorrect, but I think I
prefer we keep it this way. The fact that we actually DO flush when
some internal buffer is filled up seems more like an implementation
detail, than behavior that people should actually be depending upon.
And even knowing the actual behavior, still the only way to know that
your data is flushed is by calling PQflush (since a user has no way of
checking if we automatically flushed the internal buffer).




Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Michael Paquier
On Fri, Feb 02, 2024 at 12:19:51AM +0900, Sutou Kouhei wrote:
> Here are some numbers on my local machine (Note that my
> local machine isn't suitable for benchmark as I said
> before. Each number is median of "\watch 15" results):
>>
> I'll measure again on my local machine later. I'll stop
> other processes such as Web browser, editor and so on as
> much as possible when I do.

Thanks for compiling some numbers.  This is showing a lot of variance.
Expecially, these two lines in table 2 are showing surprising results
for v7:
  direction format  n_columns master v7v10
   fromcsv  1917.973   1695.401871.991
   from binary  1841.104   1422.012820.786

I am going to try to plug in some rusage() calls in the backend for
the COPY paths.  I hope that gives more precision about the backend
activity.  I'll post that with more numbers.
--
Michael


signature.asc
Description: PGP signature


Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-02-01 Thread Michael Paquier
On Thu, Feb 01, 2024 at 04:50:49PM +0100, Alvaro Herrera wrote:
> I think this works similarly but not identically to tablespace defaults,
> and the difference could be confusing.  You seem to have made it so that
> the partitioned table _always_ have a table AM, so the partitions can
> always inherit from it.  I think it would be more sensible to _allow_
> partitioned tables to have one, but not mandatory; if they don't have
> it, then a partition created from it would use default_table_access_method.

You mean to allow a value of 0 in pg_class.relam on a partitioned
table to allow any partitions created on it to use the default AM in
the GUC when the partition is created?  Yes, this inconsistency was
bothering me as well in the patch.
--
Michael


signature.asc
Description: PGP signature


Fwd: Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-02-01 Thread Kartyshov Ivan

Updated, rebased, fixed Ci and added documentation.
We left two different solutions. Help me please to choose the best.

1) Classic (wait_classic_v6.patch)
https://www.postgresql.org/message-id/3cc883048264c2e9af022033925ff8db%40postgrespro.ru
==
advantages: multiple wait events, separate WAIT FOR statement
disadvantages: new words in grammar



WAIT FOR  [ANY | ALL] event [, ...]
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
[ WAIT FOR [ANY | ALL] event [, ...]]
event:
LSN value
TIMEOUT number_of_milliseconds
timestamp



2) After style: Kyotaro and Freund (wait_after_within_v5.patch)
https://www.postgresql.org/message-id/d3ff2e363af60b345f82396992595a03%40postgrespro.ru
==
advantages: no new words in grammar
disadvantages: a little harder to understand, fewer events to wait



AFTER lsn_event [ WITHIN delay_milliseconds ] [, ...]
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
[ AFTER lsn_event [ WITHIN delay_milliseconds ]]
START [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
[ AFTER lsn_event [ WITHIN delay_milliseconds ]]




Regards

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/ref/after.sgml b/doc/src/sgml/ref/after.sgml
new file mode 100644
index 00..def0ec6be3
--- /dev/null
+++ b/doc/src/sgml/ref/after.sgml
@@ -0,0 +1,118 @@
+
+
+
+ 
+  AFTER
+ 
+
+ 
+  AFTER
+  7
+  SQL - Language Statements
+ 
+
+ 
+  AFTER
+  AFTER the target LSN to be replayed and for specified time to timeout
+ 
+
+ 
+
+AFTER LSN [WITHIN timeout_in_milliseconds]
+
+AFTER LSN 'lsn_number'
+AFTER LSN 'lsn_number' WITHIN wait_timeout
+
+ 
+
+ 
+  Description
+
+  
+   AFTER provides a simple interprocess communication
+   mechanism to wait for the target log sequence number 
+   (LSN) on standby in PostgreSQL
+   databases with master-standby asynchronous replication. AFTER
+   command waits for the specified LSN to be replayed.
+   If no timeout was specified, wait time is unlimited.
+   Waiting can be interrupted using Ctrl+C, or
+   by shutting down the postgres server.
+  
+
+ 
+
+ 
+  Parameters
+
+  
+   
+LSN
+
+ 
+  Specify the target log sequence number to wait for.
+ 
+
+   
+
+   
+TIMEOUT wait_timeout
+
+ 
+  Limit the time interval to AFTER the LSN to be replayed.
+  The specified wait_timeout must be an integer
+  and is measured in milliseconds.
+ 
+
+   
+
+  
+ 
+
+ 
+  Examples
+
+  
+   Run AFTER from psql,
+   limiting wait time to 1 milliseconds:
+
+
+AFTER '0/3F07A6B1' WITHIN 1;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-
+ f
+(1 row)
+
+  
+
+  
+   Wait until the specified LSN is replayed:
+
+AFTER '0/3F07A611';
+LSN reached
+-
+ t
+(1 row)
+
+  
+
+  
+   Limit LSN wait time to 50 milliseconds,
+   and then cancel the command if LSN was not reached:
+
+AFTER LSN '0/3F0FF791' WITHIN 50;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-
+ f
+(1 row)
+
+
+
+
+
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 4a42999b18..ff2e6e0f3f 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -6,6 +6,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
@@ -188,6 +189,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index 016b021487..a2794763b1 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,16 @@ PostgreSQL documentation
 
  
 
-BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] wait_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_event is:
+AFTER lsn_value [ WITHIN number_of_milliseconds ]
 
  
 
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index 74ccd7e345..46a3bcf1a8 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,13 +21,16 @@ PostgreSQL documentation
 
  
 
-START TRANSACTION [ transaction_mode [, ...] ]
+START TRANSACTION [ transaction_mode [, ...] ] wait_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_event is:
+AFTER lsn_value [ WITHIN number_of_milliseconds ]
 
  
 
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index aa94f6adf6..7c94cf9de1 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -34,6 +34,7 @@
   
 
   

Re: [PATCH] LockAcquireExtended improvement

2024-02-01 Thread Robert Haas
On Thu, Feb 1, 2024 at 2:16 AM Jingxian Li  wrote:
> According to what you said, I resubmitted a patch which splits the ProcSleep
> logic into two parts, the former is responsible for inserting self to
> WaitQueue,
> the latter is responsible for deadlock detection and processing, and the
> former part is directly called by LockAcquireExtended before nowait fails.
> In this way the nowait case can also benefit from adjusting the insertion
> order of WaitQueue.

I don't have time for a full review of this patch right now
unfortunately, but just looking at it quickly:

- It will be helpful if you write a clear commit message. If it gets
committed, there is a high chance the committer will rewrite your
message, but in the meantime it will help understanding.

- The comment for InsertSelfIntoWaitQueue needs improvement. It is
only one line. And it says "Insert self into queue if dontWait is
false" but then someone will wonder why the function would ever be
called with dontWait = true.

- Between the comments and the commit message, the division of
responsibilities between InsertSelfIntoWaitQueue and ProcSleep needs
to be clearly explained. Right now I don't think it is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Collation version tracking for macOS

2024-02-01 Thread Robert Haas
On Sun, Jan 21, 2024 at 1:58 PM Jeff Davis  wrote:
> I rendered the docs I wrote as an HTML page and attached it to this
> thread, to make it easier for others to read and comment. It's
> basically a tool for experts who are willing to devote effort to
> managing their collations and ICU libraries. Is that what we want?
>
> At an implementation level, did I get the extension APIs right? I
> considered making the API simpler, but that would require the extension
> to do quite a bit more work (including a lot of redundant work) to use
> ICU properly.

Not that I'm the most qualified person to have an opinion on this
topic, but did you intend to attach this stuff to this email, or is it
somewhere else?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: 003_extrafiles.pl test fails on Windows with the newer Perl versions

2024-02-01 Thread Andrew Dunstan



On 2024-02-01 Th 12:32, Tom Lane wrote:

Seems like the back branches are still not quite there: I'm seeing

Name "PostgreSQL::Test::Utils::windows_os" used only once: possible typo at 
t/003_extrafiles.pl line 74.

in v12 and v13, while it's

Name "PostgreSQL::Test::Utils::windows_os" used only once: possible typo at 
t/003_extrafiles.pl line 85.

in v14.  v15 and up are OK.




*sigh*


Have pushed a fix. Thanks for noticing.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Flushing large data immediately in pqcomm

2024-02-01 Thread Robert Haas
On Thu, Feb 1, 2024 at 10:52 AM Robert Haas  wrote:
> On Wed, Jan 31, 2024 at 10:24 PM Andres Freund  wrote:
> > While not perfect - e.g. because networks might use jumbo packets / large 
> > MTUs
> > and we don't know how many outstanding bytes there are locally, I think a
> > decent heuristic could be to always try to send at least one packet worth of
> > data at once (something like ~1400 bytes), even if that requires copying 
> > some
> > of the input data. It might not be sent on its own, but it should make it
> > reasonably unlikely to end up with tiny tiny packets.
>
> I think that COULD be a decent heuristic but I think it should be
> TESTED, including against the ~3 or so other heuristics proposed on
> this thread, before we make a decision.
>
> I literally mentioned the Ethernet frame size as one of the things
> that we should test whether it's relevant in the exact email to which
> you're replying, and you replied by proposing that as a heuristic, but
> also criticizing me for wanting more research before we settle on
> something. Are we just supposed to assume that your heuristic is
> better than the others proposed here without testing anything, or,
> like, what? I don't think this needs to be a completely exhaustive or
> exhausting process, but I think trying a few different things out and
> seeing what happens is smart.

There was probably a better way to phrase this email ... the sentiment
is sincere, but there was almost certainly a way of writing it that
didn't sound like I'm super-annoyed.

Apologies for that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Call pqPipelineFlush from PQsendFlushRequest

2024-02-01 Thread Tom Lane
Alvaro Herrera  writes:
> On 2023-Nov-07, Michael Paquier wrote:
>> Indeed, it looks a bit strange that there is no flush if the buffer
>> threshold is reached once the message is sent, so your suggestion
>> sounds right.  Alvaro?

> Pushed, thanks.

I observe that this patch did not touch libpq.sgml, which still says

   Note that the request is not itself flushed to the server automatically;
   use PQflush if necessary.

Doesn't that require some rewording?

regards, tom lane




Re: Add connection active, idle time to pg_stat_activity

2024-02-01 Thread Sergey Dudoladov
Hi again,

> It looks like we can check increments in all fields playing with
transactions in tests.

I've added such tests.

Regards,
Sergey
From 66ac1efe5424aa1385744a60047ffd44d42dd244 Mon Sep 17 00:00:00 2001
From: Sergey Dudoladov 
Date: Thu, 1 Feb 2024 16:11:36 +0100
Subject: [PATCH] Add pg_stat_session

Author: Sergey Dudoladov

Adds pg_stat_session view to track statistics accumulated during
 lifetime of a session (client backend).

catversion bump is necessary due to a new view / function

Reviewed-by: Aleksander Alekseev, Bertrand Drouvot, Atsushi
Torikoshi, and Andrei Zubkov

Discussion:
https://www.postgresql.org/message-id/flat/CA%2BFpmFcJF0vwi-SWW0wYO-c-FbhyawLq4tCpRDCJJ8Bq%3Dja-gA%40mail.gmail.com
---
 doc/src/sgml/monitoring.sgml| 134 
 src/backend/catalog/system_views.sql|  13 ++
 src/backend/utils/activity/backend_status.c |  64 --
 src/backend/utils/adt/pgstatfuncs.c |  70 ++
 src/include/catalog/pg_proc.dat |   9 ++
 src/include/utils/backend_status.h  |  32 +
 src/test/regress/expected/rules.out |  10 ++
 src/test/regress/expected/sysviews.out  |  37 ++
 src/test/regress/sql/sysviews.sql   |  17 +++
 9 files changed, 373 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d9b8b37585..b10423428a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -414,6 +414,20 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
See .
   
  
+
+ 
+  
+   pg_stat_session
+   pg_stat_session
+  
+  
+   One row per client backend, showing information related to
+   the currently accumulated activity of that process, such as time spent in
+   a certain state.
+   See 
+   pg_stat_session for details.
+  
+ 
 

   
@@ -4589,6 +4603,110 @@ description | Waiting for a newly initialized WAL file to reach durable storage

   
 
+  
+   pg_stat_session View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   pid integer
+  
+  
+   Process ID of this client backend.
+  
+ 
+
+ 
+  
+   active_time double precision
+  
+  
+   Time in milliseconds this backend spent in the running or fastpath state.
+  
+ 
+
+
+  
+   active_count bigint
+  
+  
+   Number of times this backend switched to the running or fastpath state.
+  
+ 
+
+ 
+  
+   idle_time double precision
+  
+  
+   Time in milliseconds this backend spent in the idle state.
+  
+ 
+
+ 
+  
+   idle_count bigint
+  
+  
+   Number of times this backend switched to the idle state.
+  
+ 
+
+ 
+  
+   idle_in_transaction_time double precision
+  
+  
+   Time in milliseconds this backend spent in the idle in transaction
+   state.
+  
+ 
+
+ 
+  
+   idle_in_transaction_count bigint
+  
+  
+   Number of times this backend switched to the idle in transaction
+   state.
+  
+ 
+
+ 
+  
+   idle_in_transaction_aborted_time double precision
+  
+  
+   Time in milliseconds this backend spent in the idle in transaction (aborted)
+   state.
+  
+ 
+
+ 
+  
+   idle_in_transaction_aborted_count bigint
+  
+  
+   Number of times this backend switched to the idle in transaction (aborted)
+   state.
+  
+ 
+
+
+   
+  
+
  
 
  
@@ -5128,6 +5246,22 @@ FROM pg_stat_get_backend_idset() AS backendid;

   
 
+  
+   
+
+ pg_stat_get_session
+
+pg_stat_get_session ( integer )
+setof record
+   
+   
+Returns a record of information about the client backend with the specified
+process ID, or one record for each active backend in the system
+if NULL is specified.  The fields returned are a
+subset of those in the pg_stat_session view.
+   
+  
+
   

 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6791bff9dd..79db3af28b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -893,6 +893,19 @@ CREATE VIEW pg_stat_activity AS
 LEFT JOIN pg_database AS D ON (S.datid = D.oid)
 LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
+CREATE VIEW pg_stat_session AS
+SELECT
+s.pid,
+s.active_time,
+s.active_count,
+s.idle_time,
+s.idle_count,
+s.idle_in_transaction_time,
+s.idle_in_transaction_count,
+s.idle_in_transaction_aborted_time,
+

Re: Reducing connection overhead in pg_upgrade compat check phase

2024-02-01 Thread vignesh C
On Sat, 27 Jan 2024 at 09:10, vignesh C  wrote:
>
> On Fri, 27 Oct 2023 at 18:50, Daniel Gustafsson  wrote:
> >
> > Attached is a v10 rebase of this patch which had undergone significant 
> > bitrot
> > due to recent changes in the pg_upgrade check phase.  This brings in the
> > changes into the proposed structure without changes to queries, with no
> > additional changes to the proposed functionality.
> >
> > Testing with a completely empty v11 cluster fresh from initdb as the old
> > cluster shows a significant speedup (averaged over multiple runs, adjusted 
> > for
> > outliers):
> >
> > patched:  53.59ms (52.78ms, 52.49ms, 55.49ms)
> > master : 125.87ms (125.23 ms, 125.67ms, 126.67ms)
> >
> > Using a similarly empty cluster from master as the old cluster shows a 
> > smaller
> > speedup, which is expected since many checks only run for older versions:
> >
> > patched: 33.36ms (32.82ms, 33.78ms, 33.47ms)
> > master : 44.87ms (44.73ms, 44.90ms 44.99ms)
> >
> > The latter case is still pretty interesting IMO since it can speed up 
> > testing
> > where every millisecond gained matters.
>
> CFBot shows that the patch does not apply anymore as in [1]:
> === Applying patches on top of PostgreSQL commit ID
> 55627ba2d334ce98e1f5916354c46472d414bda6 ===
> === applying patch
> ./v10-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patch
> patching file src/bin/pg_upgrade/check.c
> Hunk #2 FAILED at 24.
> ...
> 1 out of 7 hunks FAILED -- saving rejects to file 
> src/bin/pg_upgrade/check.c.rej
>
> Please post an updated version for the same.

With no update to the thread and the patch still not applying I'm
marking this as returned with feedback.  Please feel free to resubmit
to the next CF when there is a new version of the patch.

Regards,
Vignesh




Re: PATCH: Using BRIN indexes for sorted output

2024-02-01 Thread vignesh C
On Sun, 21 Jan 2024 at 07:32, vignesh C  wrote:
>
> On Wed, 2 Aug 2023 at 21:34, Tomas Vondra  
> wrote:
> >
> >
> >
> > On 8/2/23 17:25, Sergey Dudoladov wrote:
> > > Hello,
> > >
> > >> Parallel version is not supported, but I think it should be possible.
> > >
> > > @Tomas are you working on this ? If not, I would like to give it a try.
> > >
> >
> > Feel free to try. Just keep it in a separate part/patch, to make it
> > easier to combine the work later.
> >
> > >> static void
> > >> AssertCheckRanges(BrinSortState *node)
> > >> {
> > >> #ifdef USE_ASSERT_CHECKING
> > >>
> > >> #endif
> > >> }
> > >
> > > I guess it should not be empty at the ongoing development stage.
> > >
> > > Attached a small modification of the patch with a draft of the docs.
> > >
> >
> > Thanks. FWIW it's generally better to always post the whole patch
> > series, otherwise the cfbot gets confused as it's unable to combine
> > stuff from different messages.
>
> Are we planning to take this patch forward? It has been nearly 5
> months since the last discussion on this. If the interest has gone
> down and if there are no plans to handle this I'm thinking of
> returning this commitfest entry in this commitfest and can be opened
> when there is more interest.

Since the author or no one else showed interest in taking it forward
and the patch had no activity for more than 5 months, I have changed
the status to RWF. Feel free to add a new CF entry when someone is
planning to resume work more actively by starting off with a rebased
version.

Regards,
Vignesh




Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2024-02-01 Thread vignesh C
On Sat, 20 Jan 2024 at 09:03, vignesh C  wrote:
>
> On Mon, 20 Feb 2023 at 16:06, David Geier  wrote:
> >
> > Hi!
> >
> > On 2/14/23 13:48, David Geier wrote:
> > >
> > > It still fails.
> > >
> > > I'll get Cirrus-CI working on my own Github fork so I can make sure it
> > > really compiles on all platforms before I submit a new version.
> >
> > It took some time until Cirrus CI allowed me to run tests against my new
> > GitHub account (there's a 3 days freeze to avoid people from getting
> > Cirrus CI nodes to mine bitcoins :-D). Attached now the latest patch
> > which passes builds, rebased on latest master.
> >
> > I also reviewed the first two patches a while ago in [1]. I hope we can
> > progress with them to further reduce the size of this patch set.
> >
> > Beyond that: I could work on support for more OSs (e.g. starting with
> > Windows). Is there appetite for that or do we rather want to instead
> > start with a smaller patch?
>
> Are we planning to continue on this and take it further?
> I'm seeing that there has been no activity in this thread for nearly 1
> year now, I'm planning to close this in the current commitfest unless
> someone is planning to take it forward.

Since the author or no one else showed interest in taking it forward
and the patch had no activity for more than 1 year, I have changed the
status to RWF. Feel free to add a new CF entry when someone is
planning to resume work more actively.

Regards,
Vignesh




Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c)

2024-02-01 Thread vignesh C
On Mon, 4 Sept 2023 at 21:40, Peter Eisentraut  wrote:
>
> On 30.06.23 20:48, Karina Litskevich wrote:
> > So as for me calling LockRelationForExtension() and
> > UnlockRelationForExtension()
> > without testing eb.rel first looks more like a bug here. However, they
> > are never
> > actually called with eb.rel=NULL because of the EB_* flags, so there is
> > no bug
> > here. I believe we should add Assert checking that when eb.rel is NULL,
> > flags
> > are such that we won't use eb.rel. And yes, we can remove unnecessary checks
> > where the flags value guaranty us that eb.rel is not NULL.
> >
> > And another minor observation. It seems to me that we don't need a
> > "could have
> > been closed while waiting for lock" in ExtendBufferedRelShared(), because I
> > believe the comment above already explains why updating eb.smgr:
> >
> >   * Note that another backend might have extended the relation by the time
> >   * we get the lock.
> >
> > I attached the new version of the patch as I see it.
>
> This patch version looks the most sensible to me.  But as commented
> further downthread, some explanation around the added assertions would
> be good.

The patch which you submitted has been awaiting your attention for
quite some time now.  As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible.  Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.

Regards,
Vignesh




Re: Add system identifier to backup manifest

2024-02-01 Thread Robert Haas
On Thu, Feb 1, 2024 at 2:18 AM Amul Sul  wrote:
> I intended to minimize the out param of parse_manifest_file(), which currently
> returns manifest_files_hash and manifest_wal_range, and I need two more --
> manifest versions and the system identifier.

Sure, but you could do context.ht = manifest_data->files instead of
context.manifest = manifest_data. The question isn't whether you
should return the whole manifest_data from parse_manifest_file -- I
agree with that decision -- but rather whether you should feed the
whole thing through into the context, or just the file hash.

> Yeah, we can do that, but I think it is a bit inefficient to have strcmp()
> check for the pg_control file on each verify_backup_file() call, despite, we
> know that path.  Also, I think, we need additional handling to ensure that the
> system identifier has been verified in verify_backup_file(), what if the
> pg_control file itself missing from the backup -- might be a rare case, but
> possible.
>
> For now, we can do the system identifier validation after
> verify_backup_directory().

Yes, that's another option, but I don't think it's as good.

Suppose you do it that way. Then what will happen when the file is
altogether missing or inaccessible? I think verify_backup_file() will
complain, and then you'll have to do something ugly to avoid having
verify_system_identifier() emit the same complaint all over again.
Remember, unless you find some complicated way of passing data around,
it won't know whether verify_backup_file() emitted a warning or not --
it can redo the stat() and see what happens, but it's not absolutely
guaranteed to be the same as what happened before. Making sure that
you always emit any given complaint once rather than twice or zero
times is going to be tricky.

It seems more natural to me to just accept the (small) cost of a
strcmp() in verify_backup_file(). If the initial stat() fails, it
emits whatever complaint is appropriate and returns and the logic to
check the system identifier is never reached. If it succeeds, you can
proceed to try to open the file and do what you need to do.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pgbench - adding pl/pgsql versions of tests

2024-02-01 Thread vignesh C
On Fri, 18 Aug 2023 at 23:04, Hannu Krosing  wrote:
>
> I will address the comments here over this coming weekend.

The patch which you submitted has been awaiting your attention for
quite some time now.  As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible.  Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.

Regards,
Vignesh




Re: Where can I find the doxyfile?

2024-02-01 Thread vignesh C
On Thu, 18 Jan 2024 at 06:39, vignesh C  wrote:
>
> On Tue, 7 Nov 2023 at 12:23, John Morris  wrote:
> >
> > Another update, this time with an abbreviated Doxyfile. Rather than 
> > including the full Doxyfile, this updated version only includes modified 
> > settings. It is more compact and more likely to survive across future 
> > doxygen versions.
>
> Meson build is failing in CFbot at [1] and [2] with:
> Message: Install doxygen if you wish to create Doxygen documentation
> Program dot found: NO
> Configuring Doxyfile using configuration
>
> doc/doxygen/meson.build:52:0: ERROR: Tried to use not-found external
> program in "command"
>

With no update to the thread and the build still failing I'm marking
this as returned with feedback.  Please feel free to resubmit to the
next CF when there is a new version of the patch.

Regards,
Vignesh




Re: Should the archiver process always make sure that the timeline history files exist in the archive?

2024-02-01 Thread vignesh C
On Thu, 11 Jan 2024 at 20:38, vignesh C  wrote:
>
> On Tue, 29 Aug 2023 at 06:29, Jimmy Yih  wrote:
> >
> > Thanks for the insightful response! I have attached an updated patch
> > that moves the proposed logic to the end of StartupXLOG where it seems
> > more correct to do this. It also helps with backporting (if it's
> > needed) since the archiver process only has access to shared memory
> > starting from Postgres 14.
> >
> > Kyotaro Horiguchi  wrote:
> > > A. The OP suggests archiving the timeline history file for the current
> > >  timeline every time the archiver starts. However, I don't think we
> > >  want to keep archiving the same file over and over. (Granted, we're
> > >  not always perfect at avoiding that..)
> >
> > With the updated proposed patch, we'll be checking if the current
> > timeline history file needs to be archived at the end of StartupXLOG
> > if archiving is enabled. If it detects that a .ready or .done file
> > already exists, then it won't do anything (which will be the common
> > case). I agree though that this may be an excessive check since it'll
> > be a no-op the majority of the time. However, it shouldn't execute
> > often and seems like a quick safe preventive measure. Could you give
> > more details on why this would be too cumbersome?
> >
> > Kyotaro Horiguchi  wrote:
> > > B. Given that the steps valid, I concur to what is described in the
> > >  test script provided: standbys don't really need that history file
> > >  for the initial TLI (though I have yet to fully verify this).  If the
> > >  walreceiver just overlooks a fetch error for this file, the standby
> > >  can successfully start. (Just skipping the first history file seems
> > >  to work, but it feels a tad aggressive to me.)
> >
> > This was my initial thought as well but I wasn't sure if it was okay
> > to overlook the fetch error. Initial testing and brainstorming seems
> > to show that it's okay. I think the main bad thing is that these new
> > standbys will not have their initial timeline history files which can
> > be useful for administration. I've attached a patch that attempts this
> > approach if we want to switch to this approach as the solution. The
> > patch contains an updated TAP test as well to better showcase the
> > issue and fix.
> >
> > Kyotaro Horiguchi  wrote:
> > > C. If those steps aren't valid, we might want to add a note stating
> > >  that -X none basebackups do need the timeline history file for the
> > >  initial TLI.
> >
> > The difficult thing about only documenting this is that it forces the
> > user to manually store and track the timeline history files. It can be
> > a bit cumbersome for WAL archiving users to recognize this scenario
> > when they're just trying to optimize their basebackups by using
> > -Xnone. But then again -Xnone does seem like it's designed for
> > advanced users so this might be okay.
> >
> > Kyotaro Horiguchi  wrote:
> > > And don't forget to enable archive mode before the latest timeline
> > > switch if any.
> >
> > This might not be reasonable since a user could've been using
> > streaming replication and doing failover/failbacks as part of general
> > high availability to manage their Postgres without knowing they were
> > going to enable WAL archiving later on. The user would need to
> > configure archiving and force a failover which may not be
> > straightforward.
>
> I have changed the status of the patch to "Waiting on Author" as
> Robert's suggestions have not yet been addressed. Feel free to address
> the suggestions and update the status accordingly.

The patch which you submitted has been awaiting your attention for
quite some time now.  As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible.  Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.

Regards,
Vignesh




Re: Issue in postgres_fdw causing unnecessary wait for cancel request reply

2024-02-01 Thread vignesh C
On Thu, 11 Jan 2024 at 20:00, vignesh C  wrote:
>
> On Thu, 13 Apr 2023 at 23:34, Fujii Masao  wrote:
> >
> >
> >
> > On 2023/04/13 11:00, Kyotaro Horiguchi wrote:
> > > Agreed, it seems to be a leftover when we moved to parse_int_param()
> > > in that area.
> >
> > It looks like there was an oversight in commit e7a2217978. I've attached a 
> > patch (0002) that updates PQconnectPoll() to use parse_int_param() for 
> > parsing the keepalives parameter.
> >
> > As this change is not directly related to the bug fix, it may not be 
> > necessary to back-patch it to the stable versions, I think. Thought?
> >
> >
> > >> To clarify, are you suggesting that PQgetCancel() should
> > >> only parse the parameters for TCP connections
> > >> if cancel->raddr.addr.ss_family != AF_UNIX?
> > >> If so, I think that's a good idea.
> > >
> > > You're right. I used connip in the diff because I thought it provided
> > > the same condition, but in a simpler way.
> >
> > I made a modification to the 0001 patch. It will now allow PQgetCancel() to 
> > parse and interpret TCP connection parameters only when the connection is 
> > not made through a Unix-domain socket.
> >
> >
> > > However, I notcied that PQgetCancel() doesn't set errbuf.. So, I'm
> > > fine with your proposal.
> >
> > Ok.
> >
> >
> > >> I think it is important to inform the user when an error
> > >> occurs and a cancel request cannot be sent, as this information
> > >> can help them identify the cause of the problem (such as
> > >> setting an overly large value for the keepalives parameter).
> > >
> > > Although I view it as an internal error, I agree with emitting some
> > > error messages in that situation.
> >
> > Ok.
>
> I have changed the status of the patch to "Waiting on Author" as all
> the issues are not addressed. Feel free to address them and change the
> status accordingly.

The patch which you submitted has been awaiting your attention for
quite some time now.  As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible.  Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.

Regards,
Vignesh




Re: Assertion failure in SnapBuildInitialSnapshot()

2024-02-01 Thread vignesh C
On Thu, 11 Jan 2024 at 19:55, vignesh C  wrote:
>
> On Thu, 9 Feb 2023 at 12:02, Masahiko Sawada  wrote:
> >
> > On Wed, Feb 8, 2023 at 1:13 PM Amit Kapila  wrote:
> > >
> > > On Wed, Feb 8, 2023 at 1:19 AM Andres Freund  wrote:
> > > >
> > > > On 2023-02-01 11:23:57 +0530, Amit Kapila wrote:
> > > > > On Tue, Jan 31, 2023 at 6:08 PM Masahiko Sawada 
> > > > >  wrote:
> > > > > >
> > > > > > Attached updated patches.
> > > > > >
> > > > >
> > > > > Thanks, Andres, others, do you see a better way to fix this problem? I
> > > > > have reproduced it manually and the steps are shared at [1] and
> > > > > Sawada-San also reproduced it, see [2].
> > > > >
> > > > > [1] - 
> > > > > https://www.postgresql.org/message-id/CAA4eK1KDFeh%3DZbvSWPx%3Dir2QOXBxJbH0K8YqifDtG3xJENLR%2Bw%40mail.gmail.com
> > > > > [2] - 
> > > > > https://www.postgresql.org/message-id/CAD21AoDKJBB6p4X-%2B057Vz44Xyc-zDFbWJ%2Bg9FL6qAF5PC2iFg%40mail.gmail.com
> > > >
> > > > Hm. It's worrysome to now hold ProcArrayLock exclusively while 
> > > > iterating over
> > > > the slots. ReplicationSlotsComputeRequiredXmin() can be called at a
> > > > non-neglegible frequency.  Callers like CreateInitDecodingContext(), 
> > > > that pass
> > > > already_locked=true worry me a lot less, because obviously that's not a 
> > > > very
> > > > frequent operation.
> > > >
> > > > This is particularly not great because we need to acquire
> > > > ReplicationSlotControlLock while already holding ProcArrayLock.
> > > >
> > > >
> > > > But clearly there's a pretty large hole in the lock protection right 
> > > > now. I'm
> > > > a bit confused about why we (Robert and I, or just I) thought it's ok 
> > > > to do it
> > > > this way.
> > > >
> > > >
> > > > I wonder if we could instead invert the locks, and hold
> > > > ReplicationSlotControlLock until after 
> > > > ProcArraySetReplicationSlotXmin(), and
> > > > acquire ProcArrayLock just for ProcArraySetReplicationSlotXmin().
> > > >
> > >
> > > Along with inverting, doesn't this mean that we need to acquire
> > > ReplicationSlotControlLock in Exclusive mode instead of acquiring it
> > > in shared mode? My understanding of the above locking scheme is that
> > > in CreateInitDecodingContext(), we acquire ReplicationSlotControlLock
> > > in Exclusive mode before acquiring ProcArrayLock in Exclusive mode and
> > > release it after releasing ProcArrayLock. Then,
> > > ReplicationSlotsComputeRequiredXmin() acquires
> > > ReplicationSlotControlLock in Exclusive mode only when already_locked
> > > is false and releases it after a call to
> > > ProcArraySetReplicationSlotXmin(). ProcArraySetReplicationSlotXmin()
> > > won't change.
> >
> > I've attached the patch of this idea for discussion. In
> > GetOldestSafeDecodingTransactionId() called by
> > CreateInitDecodingContext(), we hold ReplicationSlotControlLock,
> > ProcArrayLock, and XidGenLock at a time. So we would need to be
> > careful about the ordering.
>
> I have changed the status of the patch to "Waiting on Author" as
> Robert's issues were not addressed yet. Feel free to change the status
> accordingly after addressing them.

The patch which you submitted has been awaiting your attention for
quite some time now.  As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible.  Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.

Regards,
Vignesh




Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2024-02-01 Thread vignesh C
On Thu, 11 Jan 2024 at 19:50, vignesh C  wrote:
>
> On Tue, 17 Oct 2023 at 04:18, Thomas Munro  wrote:
> >
> > I pushed the retry-loop-in-frontend-executables patch and the
> > missing-locking-in-SQL-functions patch yesterday.  That leaves the
> > backup ones, which I've rebased and attached, no change.  It sounds
> > like we need some more healthy debate about that backup label idea
> > that would mean we don't need these (two birds with one stone), so
> > I'll just leave these here to keep the cfbot happy in the meantime.
>
> I have changed the status of this to "Waiting on Author" as Robert's
> comments have not yet been handled. Feel free to post an updated
> version and change the status accordingly.

The patch which you submitted has been awaiting your attention for
quite some time now.  As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible.  Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.

Regards,
Vignesh




Re: [PATCH] Support % wildcard in extension upgrade filenames

2024-02-01 Thread Sandro Santilli
On Thu, Feb 01, 2024 at 04:31:26PM +0530, vignesh C wrote:
> 
> With no update to the thread and the tests still failing I'm marking
> this as returned with feedback.  Please feel free to resubmit to the
> next CF when there is a new version of the patch.

Ok, no problem. Thank you.

--strk;


signature.asc
Description: PGP signature


Re: Allow parallel plan for referential integrity checks?

2024-02-01 Thread vignesh C
On Sun, 21 Jan 2024 at 07:56, vignesh C  wrote:
>
> On Fri, 18 Aug 2023 at 16:29, Juan José Santamaría Flecha
>  wrote:
> >
> >
> > On Thu, Aug 17, 2023 at 3:51 PM Frédéric Yhuel  
> > wrote:
> >>
> >> On 8/17/23 14:00, Frédéric Yhuel wrote:
> >> > On 8/17/23 09:32, Frédéric Yhuel wrote:
> >> >> On 8/10/23 17:06, Juan José Santamaría Flecha wrote:
> >> >>> Recently I restored a database from a directory format backup and
> >> >>> having this feature would have been quite useful
> >> >>
> >> >> Thanks for resuming work on this patch. I forgot to mention this in my
> >> >> original email, but the motivation was also to speed up the restore
> >> >> process. Parallelizing the FK checks could make a huge difference in
> >> >> certain cases. We should probably provide such a test case (with perf
> >> >> numbers), and maybe this is it what Robert asked for.
> >> >
> >> > I have attached two scripts which demonstrate the following problems:
> >
> >
> > Thanks for the scripts, but I think Robert's concerns come from the safety, 
> > and not the performance, of the parallel operation.
> >
> > Proving its vulnerability could be easy with a counter example, but 
> > assuring its safety is trickier. What test would suffice to do that?
>
> I'm seeing that there has been no activity in this thread for more
> than 5 months, I'm planning to close this in the current commitfest
> unless someone is planning to take it forward. It can be opened again
> when there is more interest.

Since the author or no one else showed interest in taking it forward
and the patch had no activity for more than 5 months, I have changed
the status to RWF. Feel free to add a new CF entry when someone is
planning to resume work more actively.

Regards,
Vignesh




Re: Oversight in reparameterize_path_by_child leading to executor crash

2024-02-01 Thread Tom Lane
Richard Guo  writes:
> How about the attached v12 patch?  But I'm a little concerned about
> omitting PVC_RECURSE_AGGREGATES and PVC_RECURSE_WINDOWFUNCS in this
> case.  Is it possible that aggregates or window functions appear in the
> tablesample expression?

No, they can't appear there; it'd make no sense to allow such things
at the relation scan level, and we don't.

regression=# select * from float8_tbl tablesample system(count(*));
ERROR:  aggregate functions are not allowed in functions in FROM
LINE 1: select * from float8_tbl tablesample system(count(*));
^
regression=# select * from float8_tbl tablesample system(count(*) over ());
ERROR:  window functions are not allowed in functions in FROM
LINE 1: select * from float8_tbl tablesample system(count(*) over ()...
^

I pushed v12 (with some cosmetic adjustments) into the back branches,
since we're getting close to the February release freeze.  We still
need to deal with the larger fix for HEAD.  Please re-post that
one so that the cfbot knows which is the patch-of-record.

regards, tom lane




Re: Asynchronous execution support for Custom Scan

2024-02-01 Thread vignesh C
On Sun, 14 Jan 2024 at 11:21, vignesh C  wrote:
>
> On Fri, 2 Dec 2022 at 05:05, Kohei KaiGai  wrote:
> >
> > > > IIUC, we already can use ctid in the where clause on the latest
> > > > PostgreSQL, can't we?
> > >
> > > Oh, sorry, I missed the TidRangeScan. My apologies for the noise.
> > >
> > I made the ctidscan extension when we developed CustomScan API
> > towards v9.5 or v9.6, IIRC. It would make sense just an example of
> > CustomScan API (e.g, PG-Strom code is too large and complicated
> > to learn about this API), however, makes no sense from the standpoint
> > of the functionality.
>
> This patch has been moving from one CF to another CF without any
> discussion happening. It has been more than one year since any
> activity in this thread. I don't see there is much interest in this
> patch. I prefer to return this patch in this CF unless someone is
> interested in the patch and takes it forward.

Since the author or no one else showed interest in taking it forward
and the patch had no activity for more than 1 year, I have changed the
status to RWF. Feel free to add a new CF entry when someone is
planning to work on this.

Regards,
Vignesh




Re: Add connection active, idle time to pg_stat_activity

2024-02-01 Thread Sergey Dudoladov
Hi all,

@Andrei Zubkov
I've modify the patch to address most of your comments.

>  I have a thought about other possible improvements fitting to this patch.
> I think pg_stat_session view is able to add one more dimension of
monitoring - a dimension of sessions

I would like to remind here about the initial scope of this patch. The main
goal of it was to ease tracking "idle in transactions" connections, a
feature that would really help in my work. The "pg_stat_session" came into
play only because the "pg_stat_activity" was seen as an unsuitable place
for the relevant counters. With that I still would like to maintaint the
focus on committing the "idle in transactions" part of pg_stat_session
first.

Regards,
Sergey
From 05f5117be52b613bb9d4833eec38e152c6f9 Mon Sep 17 00:00:00 2001
From: Sergey Dudoladov 
Date: Thu, 1 Feb 2024 16:11:36 +0100
Subject: [PATCH] Add pg_stat_session

Author: Sergey Dudoladov

Adds pg_stat_session view to track statistics accumulated during
 lifetime of a session (client backend).

catversion bump is necessary due to a new view / function

Reviewed-by: Aleksander Alekseev, Bertrand Drouvot, Atsushi
Torikoshi, and Andrei Zubkov

Discussion:
https://www.postgresql.org/message-id/flat/CA%2BFpmFcJF0vwi-SWW0wYO-c-FbhyawLq4tCpRDCJJ8Bq%3Dja-gA%40mail.gmail.com
---
 doc/src/sgml/monitoring.sgml| 134 
 src/backend/catalog/system_views.sql|  13 ++
 src/backend/utils/activity/backend_status.c |  64 --
 src/backend/utils/adt/pgstatfuncs.c |  70 ++
 src/include/catalog/pg_proc.dat |   9 ++
 src/include/utils/backend_status.h  |  32 +
 src/test/regress/expected/rules.out |  10 ++
 src/test/regress/expected/sysviews.out  |   7 +
 src/test/regress/sql/sysviews.sql   |   3 +
 9 files changed, 329 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d9b8b37585..b10423428a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -414,6 +414,20 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
See .
   
  
+
+ 
+  
+   pg_stat_session
+   pg_stat_session
+  
+  
+   One row per client backend, showing information related to
+   the currently accumulated activity of that process, such as time spent in
+   a certain state.
+   See 
+   pg_stat_session for details.
+  
+ 
 

   
@@ -4589,6 +4603,110 @@ description | Waiting for a newly initialized WAL file to reach durable storage

   

+  
+   pg_stat_session View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   pid integer
+  
+  
+   Process ID of this client backend.
+  
+ 
+
+ 
+  
+   active_time double precision
+  
+  
+   Time in milliseconds this backend spent in the running or fastpath state.
+  
+ 
+
+
+  
+   active_count bigint
+  
+  
+   Number of times this backend switched to the running or fastpath state.
+  
+ 
+
+ 
+  
+   idle_time double precision
+  
+  
+   Time in milliseconds this backend spent in the idle state.
+  
+ 
+
+ 
+  
+   idle_count bigint
+  
+  
+   Number of times this backend switched to the idle state.
+  
+ 
+
+ 
+  
+   idle_in_transaction_time double precision
+  
+  
+   Time in milliseconds this backend spent in the idle in transaction
+   state.
+  
+ 
+
+ 
+  
+   idle_in_transaction_count bigint
+  
+  
+   Number of times this backend switched to the idle in transaction
+   state.
+  
+ 
+
+ 
+  
+   idle_in_transaction_aborted_time double precision
+  
+  
+   Time in milliseconds this backend spent in the idle in transaction (aborted)
+   state.
+  
+ 
+
+ 
+  
+   idle_in_transaction_aborted_count bigint
+  
+  
+   Number of times this backend switched to the idle in transaction (aborted)
+   state.
+  
+ 
+
+
+   
+  
+
  

  
@@ -5128,6 +5246,22 @@ FROM pg_stat_get_backend_idset() AS backendid;

   

+  
+   
+
+ pg_stat_get_session
+
+pg_stat_get_session ( integer )
+setof record
+   
+   
+Returns a record of information about the client backend with the specified
+process ID, or one record for each active backend in the system
+if NULL is specified.  The fields returned are a
+subset of those in the pg_stat_session view.
+   
+  
+
   

 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6791bff9dd..79db3af28b 100644
--- 

Re: 003_extrafiles.pl test fails on Windows with the newer Perl versions

2024-02-01 Thread Tom Lane
Seems like the back branches are still not quite there: I'm seeing

Name "PostgreSQL::Test::Utils::windows_os" used only once: possible typo at 
t/003_extrafiles.pl line 74.

in v12 and v13, while it's

Name "PostgreSQL::Test::Utils::windows_os" used only once: possible typo at 
t/003_extrafiles.pl line 85.

in v14.  v15 and up are OK.

regards, tom lane




Re: GenBKI emits useless open;close for catalogs without rows

2024-02-01 Thread vignesh C
On Wed, 8 Nov 2023 at 12:50, Peter Eisentraut  wrote:
>
> On 08.11.23 08:16, Peter Eisentraut wrote:
> > On 19.09.23 20:05, Heikki Linnakangas wrote:
> >> One thing caught my eye though: We currently have an "open" command
> >> after every "create". Except for bootstrap relations; creating a
> >> bootstrap relation opens it implicitly. That seems like a weird
> >> inconsistency. If we make "create" to always open the relation, we can
> >> both make it more consistent and save a few lines. That's maybe worth
> >> doing, per the attached. It removes the "open" command altogether, as
> >> it's not needed anymore.
> >
> > This seems like a good improvement to me.
> >
> > It would restrict the bootstrap language so that you can only manipulate
> > a table right after creating it, but I don't see why that wouldn't be
> > sufficient.
>
> Then again, this sort of achieves the opposite of what Matthias was
> aiming for: You are now forcing some relations to be opened even though
> we will end up closing it right away.
>
> (In any case, documentation in bki.sgml would need to be updated for
> this patch.)

I have changed the status of the patch to WOA, feel free to update the
status once Peter's documentation comments are addressed.

Regards,
Vignesh




Re: Using AF_UNIX sockets always for tests on Windows

2024-02-01 Thread vignesh C
On Sun, 21 Jan 2024 at 18:01, vignesh C  wrote:
>
> On Wed, 22 Mar 2023 at 09:59, Thomas Munro  wrote:
> >
> > Thanks Tom, Andres, Juan José, Andrew for your feedback.  I agree that
> > it's a better "OS harmonisation" outcome if we can choose both ways on
> > both OSes.  I will leave the 0003 patch aside for now due to lack of
> > time, but here's a rebase of the first two patches.  Since this is
> > really just more cleanup-obsolete-stuff background work, I'm going to
> > move it to the next CF.
> >
> > 0001 -- Teach mkdtemp() to care about permissions on Windows.
> > Reviewed by Juan José, who confirmed my blind-coded understanding and
> > showed an alternative version but didn't suggest doing it that way.
> > It's a little confusing that NULL "attributes" means something
> > different from NULL "descriptor", so I figured I should highlight that
> > difference more clearly with some new comments.  I guess one question
> > is "why should we expect the calling process to have the default
> > access token?"
> >
> > 0002 -- Use AF_UNIX for pg_regress.  This one removes a couple of
> > hundred Windows-only lines that set up SSPI stuff, and some comments
> > about a signal handling hazard that -- as far as I can tell by reading
> > manuals -- was never really true.
> >
> > 0003 -- TAP testing adjustments needs some more work based on feedback
> > already given, not included in this revision.
>
> The patch does not apply anymore:
> patch -p1 < v3-0002-Always-use-AF_UNIX-sockets-in-pg_regress-on-Windo.patch
> patching file src/test/regress/pg_regress.c
> ...
> Hunk #6 FAILED at 781.
> ...
> 1 out of 10 hunks FAILED -- saving rejects to file
> src/test/regress/pg_regress.c.rej

With no update to the thread and the patch not applying I'm marking
this as returned with feedback. Please feel free to resubmit to the
next CF when there is a new version of the patch.

Regards,
Vignesh




Re: Unified File API

2024-02-01 Thread vignesh C
On Sat, 6 Jan 2024 at 22:58, vignesh C  wrote:
>
> On Thu, 29 Jun 2023 at 13:20, John Morris  wrote:
> >
> > Background
> >
> > ==
> >
> > PostgreSQL has an amazing variety of routines for accessing files. Consider 
> > just the “open file” routines.
> >PathNameOpenFile, OpenTemporaryFile, BasicOpenFile, open, fopen, 
> > BufFileCreateFileSet,
> >
> >BufFileOpenFileSet, AllocateFile, OpenTransientFile,  FileSetCreate, 
> > FileSetOpen, mdcreate, mdopen,
> >
> >Smgr_open,
> >
> >
> >
> > On the downside, “amazing variety” also means somewhat confusing and 
> > difficult to add new features.
> > Someday, we’d like to add encryption or compression to the various 
> > PostgreSql files.
> > To do that, we need to bring all the relevant files into a common file API 
> > where we can implement
> > the new features.
> >
> >
> >
> > Goals of Patch
> >
> > =
> >
> > 1)Unify file access so most of “the other” files can go through a common 
> > interface, allowing new features
> > like checksums, encryption or compression to be added transparently. 2) Do 
> > it in a way which doesn’t
> > change the logic of current code. 3)Convert a reasonable set of callers to 
> > use the new interface.
> >
> >
> >
> > Note the focus is on the “other” files.  The buffer cache and the WAL have 
> > similar needs,
> > but they are being done in a separate project.  (yes, the two projects are 
> > coordinating)
> >
> > Patch 0001. Create a common file API.
> >
> > ===
> >
> > Currrently, PostgreSQL files feed into three funnels.  1) system file 
> > descriptors (read/write/open),
> > 2) C library buffered files (fread/fwri;te/fopn), and 3) virtual file 
> > descriptors (FileRead/FileWrite/PathNameOpenFile).
> > Of these three, virtual file descriptors (VFDs) are the most common. They 
> > are also the
> > only funnel which is implemented by PostgresSql.
> >
> >
> >
> > Decision: Choose VFDs as the common interface.
> >
> >
> >
> > Problem: VFDs are random access only.
> >
> > Solution: Add sequential read/write code on top of VFDs.  (FileReadSeq, 
> > FileWriteSeq, FileSeek, FileTell, O_APPEND)
> >
> >
> >
> > Problem: VFDs have minimal error handling (based on errno.)
> >
> > Solution: Add an “ferror” style interface (FileError, FileEof, 
> > FileErrorCode, FileErrorMsg)
> >
> >
> >
> > Problem: Must maintain compatibility with existing error handling code.
> >
> > Solution: save and restore errno to minimize changes to existing code.
> >
> >
> >
> > Patch 0002. Update code to use the common file API
> >
> > ===
> >
> > The second patch alters callers so they use VFDs rather than system or C 
> > library files.
> > It doesn’t modify all callers, but it does capture many of the files which 
> > need
> > to be encrypted or compressed. This is definitely WIP.
> >
> >
> >
> > Future (not too far away)
> >
> > =
> >
> > Looking ahead, there will be another set of patches which inject buffering 
> > and encryption into
> > the VFD interface. The future patches will build on the current work and 
> > introduce new “oflags”
> >
> > to enable encryption and buffering.
> >
> >
> > Compression is also a possibility, but currently lower priority and a bit 
> > tricky for random access files.
> > Let us know if you have a use case.
>
> CFbot shows few compilation warnings/error at [1]:
> [15:54:06.825] ../src/backend/storage/file/fd.c:2420:11: warning:
> unused variable 'save_errno' [-Wunused-variable]
> [15:54:06.825] int ret, save_errno;
> [15:54:06.825] ^
> [15:54:06.825] ../src/backend/storage/file/fd.c:4026:29: error: use of
> undeclared identifier 'MAXIMUM_VFD'
> [15:54:06.825] Assert(file >= 0 && file < MAXIMUM_VFD);
> [15:54:06.825] ^
> [15:54:06.825] 1 warning and 1 error generated.


With no update to the thread and the compilation still failing I'm
marking this as returned with feedback.  Please feel free to resubmit
to the next CF when there is a new version of the patch.

Regards,
Vignesh




Re: Move bki file pre-processing from initdb to bootstrap

2024-02-01 Thread vignesh C
On Sat, 11 Nov 2023 at 00:03, Krishnakumar R  wrote:
>
> Thank you for review, Peter.
>
> Makes sense on the split part. Was starting to think in same lines, at the 
> end of last iteration. Will come back shortly.
>
> On Fri, Nov 10, 2023 at 12:48 AM Peter Eisentraut  
> wrote:
>>
>> On 17.10.23 03:32, Krishnakumar R wrote:
>> >> The version comparison has been moved from initdb to bootstrap. This
>> >> created some compatibility problems with windows tests. For now I kept
>> >> the version check to not have \n added, which worked fine and serves
>> >> the purpose. However hoping to have something better in v3 in addition
>> >> to addressing any other comments.
>> >
>> > With help from Thomas, figured out that on windows fopen uses binary
>> > mode in the backend which causes issues with EOL. Please find the
>> > attached patch updated with a fix for this.
>>
>> I suggest that this patch set be split up into three incremental parts:
>>
>> 1. Move some build-time settings from initdb to postgres.bki.
>> 2. The database locale handling.
>> 3. The bki file handling.
>>
>> Each of these topics really needs a separate detailed consideration.

The patch which you submitted has been awaiting your attention for
quite some time now.  As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible.  Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.

Regards,
Vignesh




Re: Forbid the use of invalidated physical slots in streaming replication.

2024-02-01 Thread vignesh C
On Mon, 8 Jan 2024 at 10:25, vignesh C  wrote:
>
> On Fri, 8 Dec 2023 at 19:15, Ashutosh Bapat
>  wrote:
> >
> > > > > pg_replication_slot could be set back to null.
> > > >
> > > > In this case, since the basebackup was taken after the slot was 
> > > > invalidated, it
> > > > does not require the WAL that was removed. But it seems that once the
> > > > streaming starts, the slot sprints to life again and gets validated 
> > > > again. Here's
> > > > pg_replication_slot output after the standby starts.
> > >
> > > Actually, It doesn't bring the invalidated slot back to life completely.
> > > The slot's view data looks valid while the 'invalidated' flag of this 
> > > slot is still
> > > RS_INVAL_WAL_REMOVED (user are not aware of it.)
> > >
> >
> > I was mislead by the code in pg_get_replication_slots(). I did not
> > read it till the following
> >
> > --- code 
> > case WALAVAIL_REMOVED:
> >
> > /*
> > * If we read the restart_lsn long enough ago, maybe that file
> > * has been removed by now. However, the walsender could have
> > * moved forward enough that it jumped to another file after
> > * we looked. If checkpointer signalled the process to
> > * termination, then it's definitely lost; but if a process is
> > * still alive, then "unreserved" seems more appropriate.
> > *
> > * If we do change it, save the state for safe_wal_size below.
> > */
> > --- code ---
> >
> > I see now how an invalid slot's wal status can be reported as
> > unreserved. So I think it's a bug.
> >
> > >
> > > >
> > > > >
> > > > > So, I think it's a bug and one idea to fix is to check the validity of
> > > > > the physical slot in
> > > > > StartReplication() after acquiring the slot like what the attachment
> > > > > does, what do you think ?
> > > >
> > > > I am not sure whether that's necessarily a bug. Of course, we don't 
> > > > expect
> > > > invalid slots to be used but in this case I don't see any harm.
> > > > The invalid slot has been revived and has all the properties set just 
> > > > like a valid
> > > > slot. So it must be considered in
> > > > ReplicationSlotsComputeRequiredXmin() as well. I haven't verified it 
> > > > myself
> > > > though. In case the WAL is really lost and is requested by the standby 
> > > > it will
> > > > throw an error "requested WAL segment [0-9A-F]+ has already been
> > > > removed". So no harm there as well.
> > >
> > > Since the 'invalidated' field of the slot is still 
> > > (RS_INVAL_WAL_REMOVED), even
> > > if the walsender advances the restart_lsn, the slot will not be 
> > > considered in the
> > > ReplicationSlotsComputeRequiredXmin(), so the WALs and Rows are not safe
> > > and that's why I think it's a bug.
> > >
> > > After looking closer, it seems this behavior started from 15f8203 which 
> > > introduced the
> > > ReplicationSlotInvalidationCause 'invalidated', after that we check the 
> > > invalidated enum
> > > in Xmin/Lsn computation function.
> > >
> > > If we want to go back to previous behavior, we need to revert/adjust the 
> > > check
> > > for invalidated in ReplicationSlotsComputeRequiredXmin(), but since the 
> > > logical
> > > decoding(walsender) disallow using invalidated slot, so I feel it's 
> > > consistent
> > > to do similar check for physical one. Besides, 
> > > pg_replication_slot_advance()
> > > also disallow passing invalidated slot to it as well. What do you think ?
> >
> > What happens if you run your script on a build prior to 15f8203?
> > Purely from reading the code, it looks like the physical slot would
> > sprint back to life since its restart_lsn would be updated. But I am
> > not able to see what happens to invalidated_at. It probably remains a
> > valid LSN and the slot would still not be considred in xmin
> > calculation. It will be good to be compatible to pre-15f8203
> > behaviour.
>
> I have changed the patch status to "Waiting on Author", as some of the
> queries requested by Ashutosh are not yet addressed.

The patch which you submitted has been awaiting your attention for
quite some time now.  As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible.  Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.

Regards,
Vignesh




Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-02-01 Thread Michail Nikolaev
> > > I just realised there is one issue with this design: We can't cheaply
> > > reset the snapshot during the second table scan:
> > > It is critically important that the second scan of R/CIC uses an index
> > > contents summary (made with index_bulk_delete) that was created while
> > > the current snapshot was already registered.
> >
> > > So, the "reset the snapshot every so often" trick cannot be applied in
> > > phase 3 (the rescan), or we'd have to do an index_bulk_delete call
> > > every time we reset the snapshot. Rescanning might be worth the cost
> > > (e.g. when using BRIN), but that is very unlikely.
> >
> > Hm, I think it is still possible. We could just manually recheck the
> > tuples we see
> > to the snapshot currently used for the scan. If an "old" snapshot can see
> > the tuple also (HeapTupleSatisfiesHistoricMVCC) then search for it in the
> > index summary.
> That's an interesting method.
>
> How would this deal with tuples not visible to the old snapshot?
> Presumably we can assume they're newer than that snapshot (the old
> snapshot didn't have it, but the new one does, so it's committed after
> the old snapshot, making them newer), so that backend must have
> inserted it into the index already, right?

I made a draft of the patch and this idea is not working.

The problem is generally the same:

* reference snapshot sees tuple X
* reference snapshot is used to create index summary (but there is no
tuple X in the index summary)
* tuple X is updated to Y creating a HOT-chain
* we started scan with new temporary snapshot (it sees Y, X is too old for it)
* tuple X is pruned from HOT-chain because it is not protected by any snapshot
* we see tuple Y in the scan with temporary snapshot
* it is not in the index summary - so, we need to check if
reference snapshot can see it
* there is no way to understand if the reference snapshot was able
to see tuple X - because we need the full HOT chain (with X tuple) for
that

Best regards,
Michail.




Re: Flushing large data immediately in pqcomm

2024-02-01 Thread Robert Haas
On Wed, Jan 31, 2024 at 10:24 PM Andres Freund  wrote:
> While not perfect - e.g. because networks might use jumbo packets / large MTUs
> and we don't know how many outstanding bytes there are locally, I think a
> decent heuristic could be to always try to send at least one packet worth of
> data at once (something like ~1400 bytes), even if that requires copying some
> of the input data. It might not be sent on its own, but it should make it
> reasonably unlikely to end up with tiny tiny packets.

I think that COULD be a decent heuristic but I think it should be
TESTED, including against the ~3 or so other heuristics proposed on
this thread, before we make a decision.

I literally mentioned the Ethernet frame size as one of the things
that we should test whether it's relevant in the exact email to which
you're replying, and you replied by proposing that as a heuristic, but
also criticizing me for wanting more research before we settle on
something. Are we just supposed to assume that your heuristic is
better than the others proposed here without testing anything, or,
like, what? I don't think this needs to be a completely exhaustive or
exhausting process, but I think trying a few different things out and
seeing what happens is smart.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-02-01 Thread Alvaro Herrera
> @@ -947,20 +947,22 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid 
> ownerId,
>* a type of relation that needs one, use the default.
>*/
>   if (stmt->accessMethod != NULL)
> + accessMethodId = get_table_am_oid(stmt->accessMethod, false);
> + else if (RELKIND_HAS_TABLE_AM(relkind) || relkind == 
> RELKIND_PARTITIONED_TABLE)
>   {
> - accessMethod = stmt->accessMethod;
> -
> - if (partitioned)
> - ereport(ERROR,
> - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> -  errmsg("specifying a table access 
> method is not supported on a partitioned table")));
> + if (stmt->partbound)
> + {
> + /*
> +  * For partitions, if no access method is specified, 
> use the AM of
> +  * the parent table.
> +  */
> + Assert(list_length(inheritOids) == 1);
> + accessMethodId = 
> get_rel_relam(linitial_oid(inheritOids));
> + Assert(OidIsValid(accessMethodId));
> + }
> + else
> + accessMethodId = 
> get_table_am_oid(default_table_access_method, false);
>   }

I think this works similarly but not identically to tablespace defaults,
and the difference could be confusing.  You seem to have made it so that
the partitioned table _always_ have a table AM, so the partitions can
always inherit from it.  I think it would be more sensible to _allow_
partitioned tables to have one, but not mandatory; if they don't have
it, then a partition created from it would use default_table_access_method.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green
stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'.
After collecting 500 such letters, he mused, a university somewhere in
Arizona would probably grant him a degree.  (Don Knuth)




Re: Skip collecting decoded changes of already-aborted transactions

2024-02-01 Thread vignesh C
On Tue, 3 Oct 2023 at 15:54, vignesh C  wrote:
>
> On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada  wrote:
> >
> > On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar  wrote:
> > >
> > > On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > In logical decoding, we don't need to collect decoded changes of
> > > > aborted transactions. While streaming changes, we can detect
> > > > concurrent abort of the (sub)transaction but there is no mechanism to
> > > > skip decoding changes of transactions that are known to already be
> > > > aborted. With the attached WIP patch, we check CLOG when decoding the
> > > > transaction for the first time. If it's already known to be aborted,
> > > > we skip collecting decoded changes of such transactions. That way,
> > > > when the logical replication is behind or restarts, we don't need to
> > > > decode large transactions that already aborted, which helps improve
> > > > the decoding performance.
> > > >
> > > +1 for the idea of checking the transaction status only when we need
> > > to flush it to the disk or send it downstream (if streaming in
> > > progress is enabled).   Although this check is costly since we are
> > > planning only for large transactions then it is worth it if we can
> > > occasionally avoid disk or network I/O for the aborted transactions.
> > >
> >
> > Thanks.
> >
> > I've attached the updated patch. With this patch, we check the
> > transaction status for only large-transactions when eviction. For
> > regression test purposes, I disable this transaction status check when
> > logical_replication_mode is set to 'immediate'.
>
> May be there is some changes that are missing in the patch, which is
> giving the following errors:
> reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’:
> reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared
> (first use in this function)
>  3584 | if (unlikely(logical_replication_mode ==
> LOGICAL_REP_MODE_IMMEDIATE))
>   |  ^~~~

With no update to the thread and the compilation still failing I'm
marking this as returned with feedback.  Please feel free to resubmit
to the next CF when there is a new version of the patch.

Regards,
Vignesh




Re: Volatile write caches on macOS and Windows, redux

2024-02-01 Thread vignesh C
On Mon, 22 Jan 2024 at 07:46, Peter Smith  wrote:
>
> 2024-01 Commitfest.
>
> Hi, this patch was marked in CF as "Needs Review" [1], but there has
> been no activity on this thread for 6+ months.
>
> Is anything else planned, or can you post something to elicit more
> interest in the patch? Otherwise, if nothing happens then the CF entry
> will be closed ("Returned with feedback") at the end of this CF.

With no update to the thread and the patch not applying I'm marking
this as returned with feedback.  Please feel free to resubmit to the
next CF when there is a new version of the patch.

Regards,
Vignesh




Re: Request for comment on setting binary format output per session

2024-02-01 Thread vignesh C
On Sat, 27 Jan 2024 at 07:45, vignesh C  wrote:
>
> On Mon, 31 Jul 2023 at 21:58, Dave Cramer  wrote:
> >
> >
> > Dave Cramer
> >
> >
> > On Mon, 10 Jul 2023 at 03:56, Daniel Gustafsson  wrote:
> >>
> >> > On 25 Apr 2023, at 16:47, Dave Cramer  wrote:
> >>
> >> > Patch attached with comments removed
> >>
> >> This patch no longer applies, please submit a rebased version on top of 
> >> HEAD.
> >
> >
> > Rebased see attached
>
> CFBot shows that the patch does not apply anymore as in [1]:
> === Applying patches on top of PostgreSQL commit ID
> fba2112b1569fd001a9e54dfdd73fd3cb8f16140 ===
> === applying patch ./0001-Created-protocol.h.patch
> patching file src/backend/access/common/printsimple.c
> Hunk #1 succeeded at 22 with fuzz 2 (offset 1 line).
> Hunk #2 FAILED at 33.
> Hunk #3 FAILED at 66.
> 2 out of 3 hunks FAILED -- saving rejects to file
> src/backend/access/common/printsimple.c.rej
> patching file src/backend/access/transam/parallel.c
> Hunk #1 succeeded at 34 (offset 1 line).
> Hunk #2 FAILED at 1128.
> Hunk #3 FAILED at 1138.
> Hunk #4 FAILED at 1184.
> Hunk #5 succeeded at 1205 (offset 4 lines).
> Hunk #6 FAILED at 1218.
> Hunk #7 FAILED at 1373.
> Hunk #8 FAILED at 1551.
> 6 out of 8 hunks FAILED -- saving rejects to file
> src/backend/access/transam/parallel.c.rej
>
> Please post an updated version for the same.
>
> [1] - http://cfbot.cputube.org/patch_46_3777.log

The patch which you submitted has been awaiting your attention for
quite some time now.  As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible.  Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.

Regards,
Vignesh




Re: Possibility to disable `ALTER SYSTEM`

2024-02-01 Thread Robert Haas
On Thu, Feb 1, 2024 at 7:33 AM Bruce Momjian  wrote:
> On Tue, Jan 30, 2024 at 04:25:12PM -0500, Robert Haas wrote:
> > I don't think we should pretend like one of the two paragraphs above
> > is valid and the other is hot garbage. That's not solving anything. We
> > can't resolve the tension between those two things in either direction
> > by somebody hammering on the side of the argument that they believe to
> > be correct and ignoring the other one.
>
> What if we generate log messages when certain commands are used, like
> ALTER TABLE?  We could have GUC which controls which commands are
> logged.

Well, as I understand it, that doesn't solve the problem here. The
problem some people want to solve here seems to be:

On my system, the PostgreSQL configuration parameters are being
managed by $EXTERNAL_TOOL. Therefore, they should not be managed by
PostgreSQL itself. Therefore, if someone uses ALTER SYSTEM, they've
made a mistake, so we should give them an ERROR telling them that,
like:

ERROR: you're supposed to update the configuration via k8s, not ALTER
SYSTEM, you dummy!
DETAIL: Stop being an idiot.

The exact message text might need some wordsmithing. :-)

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Collation version tracking for macOS

2024-02-01 Thread vignesh C
On Mon, 22 Jan 2024 at 00:28, Jeff Davis  wrote:
>
> On Sat, 2024-01-20 at 07:40 +0530, vignesh C wrote:
> > This thread has been idle for a year now, It has stalled after a lot
> > of discussion.
> > @Jeff Davis: Do you want to try to restart the discussion by posting
> > an updated version and see what happens?
>
> Thank you for following up. Yes, I'd like to find a path forward here,
> but I need some validation from others on my approach.

Let's start by posting a rebased version to fix the CFBot patch apply
issue as in [1]:

=== Applying patches on top of PostgreSQL commit ID
402388946fb3ac54f0fd5944d7e177ef7737eab2 ===
=== applying patch
./v8-0001-Support-multiple-ICU-collation-provider-libraries.patch
patching file src/backend/commands/collationcmds.c
Hunk #1 FAILED at 566.

1 out of 4 hunks FAILED -- saving rejects to file
src/backend/commands/collationcmds.c.rej
patching file src/backend/utils/adt/formatting.c
Hunk #1 succeeded at 1575 (offset 9 lines).
Hunk #2 succeeded at 1587 (offset 9 lines).
Hunk #3 succeeded at 1605 (offset 9 lines).
Hunk #4 succeeded at 1700 (offset 3 lines).
Hunk #5 succeeded at 1819 (offset -1 lines).
Hunk #6 succeeded at 1939 (offset -5 lines).
patching file src/backend/utils/adt/pg_locale.c
Hunk #1 FAILED at 70.
...
Hunk #31 FAILED at 2886.
Hunk #32 FAILED at 2902.
22 out of 32 hunks FAILED -- saving rejects to file
src/backend/utils/adt/pg_locale.c.rej

[1] - http://cfbot.cputube.org/patch_46_3956.log

Regards,
Vignesh




Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Sutou Kouhei
Hi,

Thanks for preparing benchmark.

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 1 Feb 2024 12:49:59 +0900,
  Michael Paquier  wrote:

> On Thu, Feb 01, 2024 at 10:57:58AM +0900, Michael Paquier wrote:
>> And here are the results I get for text and binary (ms, average of 15
>> queries after discarding the three highest and three lowest values):
>>   test   | master |  v7  | v10  
>> -++--+--
>>  from_bin_1col   | 1575   | 1546 | 1575
>>  from_bin_10col  | 5364   | 5208 | 5230
>>  from_text_1col  | 1690   | 1715 | 1684
>>  from_text_10col | 4875   | 4793 | 4757
>>  to_bin_1col | 1717   | 1730 | 1731
>>  to_bin_10col| 7728   | 7707 | 7513
>>  to_text_1col| 1710   | 1730 | 1698
>>  to_text_10col   | 5998   | 5960 | 5987
> 
> Here are some numbers from a second local machine:
>   test   | master |  v7  | v10  
> -++--+--
>  from_bin_1col   | 508| 467  | 461
>  from_bin_10col  | 2192   | 2083 | 2098
>  from_text_1col  | 510| 499  | 517
>  from_text_10col | 1970   | 1678 | 1654
>  to_bin_1col | 575| 577  | 573
>  to_bin_10col| 2680   | 2678 | 2722
>  to_text_1col| 516| 506  | 527
>  to_text_10col   | 2250   | 2245 | 2235
> 
> This is confirming a speedup with COPY FROM for both text and binary,
> with more impact with a larger number of attributes.  That's harder to
> conclude about COPY TO in both cases, but at least I'm not seeing any
> regression even with some variance caused by what looks like noise.
> We need more numbers from more people.  Sutou-san or Sawada-san, or
> any volunteers?

Here are some numbers on my local machine (Note that my
local machine isn't suitable for benchmark as I said
before. Each number is median of "\watch 15" results):

1:
 direction format  n_columns master v7v10
to   text  1   1077.254   1016.953   1028.434
tocsv  11079.88   1055.5451053.95
to binary  1   1051.2471033.931003.44
to   text 10   4373.168   3980.4423955.94
tocsv 10   4753.842 4719.2   4677.643
to binary 10   4598.374   4431.238   4285.757
  from   text  1875.729916.526869.283
  fromcsv  1909.355   1001.277918.655
  from binary  1872.943907.778859.433
  from   text 10   2594.429   2345.292   2587.603
  fromcsv 10   2968.972   3039.544   2964.468
  from binary 103072.01   3109.267   3093.983

2:
 direction format  n_columns master v7v10
to   text  1   1061.908988.768978.291
tocsv  1   1095.109   1037.015   1041.613
to binary  1   1076.992   1000.212983.318
to   text 10   4336.517   3901.833   3841.789
tocsv 10   4679.411   4640.975   4570.774
to binary 104465.04   4508.063   4261.749
  from   text  1866.689 917.54830.417
  fromcsv  1917.973   1695.401871.991
  from binary  1841.104   1422.012820.786
  from   text 10   2523.607   3147.738   2517.505
  fromcsv 10   2917.018   3042.685   2950.338
  from binary 10   2998.051   3128.542   3018.954

3:
 direction format  n_columns master v7v10
to   text  1   1021.168   1031.183962.945
tocsv  1   1076.549   1069.661   1060.258
to binary  1   1024.611   1022.143975.768
to   text 104327.24   3936.703   4049.893
tocsv 10   4620.436   4531.676   4685.672
to binary 10   4457.165   4390.992   4301.463
  from   text  1887.532907.365888.892
  fromcsv  1945.1671012.29895.921
  from binary  1 853.06854.652849.661
  from   text 10   2660.509   2304.256   2527.071
  fromcsv 10   2913.644   2968.204   2935.081
  from binary 10   3020.812   3081.162   3090.803

I'll measure again on my local machine later. I'll stop
other processes such as Web browser, editor and so on as
much as possible when I do.


Thanks,
-- 
kou




Re: Moving forward with TDE [PATCH v3]

2024-02-01 Thread vignesh C
On Mon, 22 Jan 2024 at 11:47, Peter Smith  wrote:
>
> 2024-01 Commitfest.
>
> Hi, This patch has a CF status of "Needs Review" [1], but it seems
> there were CFbot test failures last time it was run [2]. Please have a
> look and post an updated version if necessary.

The patch which you submitted has been awaiting your attention for
quite some time now.  As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible.  Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.

Regards,
Vignesh




Re: [PATCH] ltree hash functions

2024-02-01 Thread vignesh C
On Wed, 6 Dec 2023 at 04:08, Tommy Pavlicek  wrote:
>
> Thanks.
>
> I've attached the latest version that updates the naming in line with
> the convention.
>
> On Mon, Dec 4, 2023 at 12:46 AM jian he  wrote:
> >
> > On Fri, Dec 1, 2023 at 8:44 AM Tommy Pavlicek  wrote:
> > >
> > >
> > > Patch updated for those comments (and a touch of cleanup in the tests) 
> > > attached.
> >
> > it would be a better name as hash_ltree than ltree_hash, similar logic
> > applies to ltree_hash_extended.
> > that would be the convention. see: 
> > https://stackoverflow.com/a/69650940/15603477
> >
> >
> > Other than that, it looks good.

CFBot shows one of the test is failing as in [1]:
diff -U3 /tmp/cirrus-ci-build/contrib/ltree/expected/ltree.out
/tmp/cirrus-ci-build/build-32/testrun/ltree/regress/results/ltree.out
--- /tmp/cirrus-ci-build/contrib/ltree/expected/ltree.out 2024-01-31
15:18:42.893039599 +
+++ /tmp/cirrus-ci-build/build-32/testrun/ltree/regress/results/ltree.out
2024-01-31 15:23:25.309028749 +
@@ -1442,9 +1442,14 @@
('0.1.2'::ltree), ('0'::ltree), ('0_asd.1_ASD'::ltree)) x(v)
 WHERE  hash_ltree(v)::bit(32) != hash_ltree_extended(v, 0)::bit(32)
OR hash_ltree(v)::bit(32) = hash_ltree_extended(v, 1)::bit(32);
- value | standard | extended0 | extended1
+--+---+---
-(0 rows)
+value| standard |
extended0 |extended1
+-+--+--+--
+ 0   | 10001010100010011011 |
0101100001000100011001011011 | 010110000100010001101001
+ 0.1 | 101010001010110001001110 |
0000100010001100110111010101 | 0000100010001101100011010101
+ 0.1.2   | 0111010101110100 |
1010111001110101100011010111 | 101011100111011100100011
+ 0   | 10001010100010011011 |
0101100001000100011001011011 | 010110000100010001101001
+ 0_asd.1_ASD | 011000101000101001001101 |
0000100010001100110111010101 | 0000100010001101100011010101
+(5 rows)

Please post an updated version for the same.

[1] - 
https://api.cirrus-ci.com/v1/artifact/task/5572544858685440/testrun/build-32/testrun/ltree/regress/regression.diffs

Regards,
Vignesh




Re: Refactoring backend fork+exec code

2024-02-01 Thread Heikki Linnakangas

On 30/01/2024 02:08, Heikki Linnakangas wrote:

On 29/01/2024 17:54, reid.thomp...@crunchydata.com wrote:

On Thu, 2024-01-25 at 01:51 +0200, Heikki Linnakangas wrote:


And here we go. BackendID is now a 1-based index directly into the
PGPROC array.


Would it be worthwhile to also note in this comment FIRST_AUX_PROC's
and IsAuxProcess()'s dependency on B_ARCHIVER and it's location in the
enum table?


Yeah, that might be in order. Although looking closer, it's only used in
IsAuxProcess, which is only used in one sanity check in
AuxProcessMain(). And even that gets refactored away by the later
patches in this thread. So on second thoughts, I'll just remove it
altogether.

I spent some more time on the 'lastBackend' optimization in sinvaladt.c.
I realized that it became very useless with these patches, because aux
processes are allocated pgprocno's after all the slots for regular
backends. There are always aux processes running, so lastBackend would
always have a value close to the max anyway. I replaced that with a
dense 'pgprocnos' array that keeps track of the exact slots that are in
use. I'm not 100% sure this is worth the effort; does any real world
workload send shared invalidations so frequently that this matters? In
any case, this should avoid the regression if such a workload exists.

New patch set attached. I think these are ready to be committed, but
would appreciate a final review.


contrib/amcheck 003_cic_2pc.pl test failures revealed a bug that 
required some reworking:


In a PGPROC entry for a prepared xact, the PGPROC's backendID needs to 
be the original backend's ID, because the prepared xact is holding the 
lock on the original virtual transaction id. When a transaction's 
ownership is moved from the original backend's PGPROC entry to the 
prepared xact PGPROC entry, the backendID needs to be copied over. My 
patch removed the field altogether, so it was not copied over, which 
made it look like it the original VXID lock was released at prepare.


I fixed that by adding back the backendID field. For regular backends, 
it's always equal to pgprocno + 1, but for prepared xacts, it's the 
original backend's ID. To make that less confusing, I moved the 
backendID and lxid fields together under a 'vxid' struct. The two fields 
together form the virtual transaction ID, and that's the only context 
where the 'backendID' field should now be looked at.


I also squashed the 'lastBackend' changes in sinvaladt.c to the main patch.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 96c583b32db843fb07d38fd78f1e205882a78b01 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 24 Jan 2024 23:15:55 +0200
Subject: [PATCH v9 1/4] Remove superfluous 'pgprocno' field from PGPROC

It was always just the index of the PGPROC entry from the beginning of
the proc array. Introduce a macro to compute it from the pointer
instead.

Reviewed-by: XXX
Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407%40iki.fi
---
 src/backend/access/transam/clog.c |  4 ++--
 src/backend/access/transam/twophase.c | 11 +--
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/postmaster/bgwriter.c |  2 +-
 src/backend/postmaster/pgarch.c   |  2 +-
 src/backend/postmaster/walsummarizer.c|  2 +-
 src/backend/storage/buffer/bufmgr.c   |  6 +++---
 src/backend/storage/ipc/procarray.c   |  6 +++---
 src/backend/storage/lmgr/condition_variable.c | 12 ++--
 src/backend/storage/lmgr/lwlock.c |  6 +++---
 src/backend/storage/lmgr/predicate.c  |  2 +-
 src/backend/storage/lmgr/proc.c   |  1 -
 src/include/storage/lock.h|  2 +-
 src/include/storage/proc.h|  6 +-
 14 files changed, 29 insertions(+), 35 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f6e7da7ffc9..7550309c25a 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -458,7 +458,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		 * less efficiently.
 		 */
 		if (nextidx != INVALID_PGPROCNO &&
-			ProcGlobal->allProcs[nextidx].clogGroupMemberPage != proc->clogGroupMemberPage)
+			GetPGProcByNumber(nextidx)->clogGroupMemberPage != proc->clogGroupMemberPage)
 		{
 			/*
 			 * Ensure that this proc is not a member of any clog group that
@@ -473,7 +473,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 
 		if (pg_atomic_compare_exchange_u32(>clogGroupFirst,
 		   ,
-		   (uint32) proc->pgprocno))
+		   (uint32) GetNumberFromPGProc(proc)))
 			break;
 	}
 
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 8426458f7f5..234c8d08ebc 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -284,7 +284,7 @@ 

Re: Possibility to disable `ALTER SYSTEM`

2024-02-01 Thread Bruce Momjian
On Tue, Jan 30, 2024 at 04:25:12PM -0500, Robert Haas wrote:
> I don't think we should pretend like one of the two paragraphs above
> is valid and the other is hot garbage. That's not solving anything. We
> can't resolve the tension between those two things in either direction
> by somebody hammering on the side of the argument that they believe to
> be correct and ignoring the other one.

What if we generate log messages when certain commands are used, like
ALTER TABLE?  We could have GUC which controls which commands are
logged.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Fix bugs not to discard statistics when changing stats_fetch_consistency

2024-02-01 Thread Shinya Kato

On 2024-02-01 17:33, Michael Paquier wrote:

On Thu, Jan 11, 2024 at 06:18:38PM +0900, Shinya Kato wrote:

Hi, hackers


(Sorry for the delay, this thread was on my TODO list for some time.)

There is below description in docs for stats_fetch_consistency.
"Changing this parameter in a transaction discards the statistics 
snapshot."


However, I wonder if changes stats_fetch_consistency in a transaction,
statistics is not discarded in some cases.


Yep, you're right.  This is inconsistent with the documentation where
we need to clean up the cached data when changing this GUC.  I was
considering a few options regarding the location of the extra
pgstat_clear_snapshot(), but at the end I see the point in doing it in
a path even if it creates a duplicate with pgstat_build_snapshot()
when pgstat_fetch_consistency is using the snapshot mode.  A location
better than your patch is pgstat_snapshot_fixed(), though, so as new
stats kinds will be able to get the call.

I have been banging my head on my desk for a bit when thinking about a
way to test that in a predictible way, until I remembered that these
stats are only flushed at commit, so this requires at least two
sessions, with one of them having a transaction opened while
manipulating stats_fetch_consistency.  TAP would be one option, but
I'm not really tempted about spending more cycles with a
background_psql just for this case.  If I'm missing something, feel
free.


Thank you for the review and pushing!
I think it is better and more concise than my implementation.

--
Regards,
Shinya Kato
NTT DATA GROUP CORPORATION




Re: Synchronizing slots from primary to standby

2024-02-01 Thread shveta malik
On Wed, Jan 31, 2024 at 2:02 PM Masahiko Sawada  wrote:
>
> ---
> +static char *
> +wait_for_valid_params_and_get_dbname(void)
> +{
> +   char   *dbname;
> +   int rc;
> +
> +   /* Sanity check. */
> +   Assert(enable_syncslot);
> +
> +   for (;;)
> +   {
> +   if (validate_parameters_and_get_dbname())
> +   break;
> +   ereport(LOG, errmsg("skipping slot synchronization"));
> +
> +   ProcessSlotSyncInterrupts(NULL);
>
> When reading this function, I expected that the slotsync worker would
> resume working once the parameters became valid, but it was not
> correct. For example, if I changed hot_standby_feedback from off to
> on, the slotsync worker reads the config file, exits, and then
> restarts. Given that the slotsync worker ends up exiting on parameter
> changes anyway, why do we want to have it wait for parameters to
> become valid? IIUC even if the slotsync worker exits when a parameter
> is not valid, it restarts at some intervals.

Thanks for the feedback Changed this functionality in v75. Now we do
not exit in wait_for_valid_params_and_get_dbname() on GUC change. We
re-validate the new values and if found valid, carry on with
slot-syncing else continue waiting.

> ---
> +bool
> +SlotSyncWorkerCanRestart(void)
> +{
> +#define SLOTSYNC_RESTART_INTERVAL_SEC 10
> +
>
> IIUC depending on how busy the postmaster is and the timing, the user
> could wait for 1 min to re-launch the slotsync worker. But I think the
> user might want to re-launch the slotsync worker more quickly for
> example when the slotsync worker restarts due to parameter changes.
> IIUC SloSyncWorkerCanRestart() doesn't consider the fact that the
> slotsync worker previously exited with 0 or 1.

Modified this in v75. As you suggested in [1], we reset
last_start_time on GUC change before proc_exit, so that the postmaster
restarts worker immediately without waiting.

> ---
> +   /* We are a normal standby */
> +   valid = DatumGetBool(slot_getattr(tupslot, 2, ));
> +   Assert(!isnull);
>
> What do you mean by "normal standby"?
>
> ---
> +   appendStringInfo(,
> +"SELECT pg_is_in_recovery(), count(*) = 1"
> +" FROM pg_replication_slots"
> +" WHERE slot_type='physical' AND slot_name=%s",
> +quote_literal_cstr(PrimarySlotName));
>
> I think we need to make "pg_replication_slots" schema-qualified.

Modified.

> ---
> +   errdetail("The primary server slot \"%s\" specified by"
> + " \"%s\" is not valid.",
> + PrimarySlotName, "primary_slot_name"));
>
> and
>
> +   errmsg("slot sync worker will shutdown because"
> +  " %s is disabled", "enable_syncslot"));
>
> It's better to write it in one line for better greppability.

Modified.

[1]: 
https://www.postgresql.org/message-id/CAD21AoAv6FwZ6UPNTj6%3D7A%2B3O2m4utzfL8ZGS6X1EGexikG66A%40mail.gmail.com

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-01 Thread shveta malik
On Thu, Feb 1, 2024 at 11:21 AM Peter Smith  wrote:
>
> Here are some review comments for v740001.

Thanks Peter for the feedback.

> ==
> src/sgml/logicaldecoding.sgml
>
> 1.
> +   
> +Replication Slot Synchronization
> +
> + A logical replication slot on the primary can be synchronized to the hot
> + standby by enabling the failover option during slot
> + creation and setting
> +  linkend="guc-enable-syncslot">enable_syncslot
> + on the standby. For the synchronization
> + to work, it is mandatory to have a physical replication slot between the
> + primary and the standby, and
> +  linkend="guc-hot-standby-feedback">hot_standby_feedback
> + must be enabled on the standby. It is also necessary to specify a valid
> + dbname in the
> +  linkend="guc-primary-conninfo">primary_conninfo
> + string, which is used for slot synchronization and is ignored
> for streaming.
> +
>
> IMO we don't need to repeat that last part ", which is used for slot
> synchronization and is ignored for streaming." because that is a
> detail about the primary_conninfo GUC, and the same information is
> already described in that GUC section.

Modified in v75.

> ==
>
> 2. ALTER_REPLICATION_SLOT slot_name ( option [, ...] ) #
>
>   
> -  If true, the slot is enabled to be synced to the standbys.
> +  If true, the slot is enabled to be synced to the standbys
> +  so that logical replication can be resumed after failover.
>   
>
> This also should have the sentence "The default is false.", e.g. the
> same as the same option in CREATE_REPLICATION_SLOT says.

I have not added this. I feel the default value related details should
be present in the 'CREATE' part, it is not meaningful for the "ALTER"
part. ALTER does not have any defaults, it just modifies the options
given by the user.

> ==
> synchronize_one_slot
>
> 3.
> + /*
> + * Make sure that concerned WAL is received and flushed before syncing
> + * slot to target lsn received from the primary server.
> + *
> + * This check will never pass if on the primary server, user has
> + * configured standby_slot_names GUC correctly, otherwise this can hit
> + * frequently.
> + */
> + latestFlushPtr = GetStandbyFlushRecPtr(NULL);
> + if (remote_slot->confirmed_lsn > latestFlushPtr)
>
> BEFORE
> This check will never pass if on the primary server, user has
> configured standby_slot_names GUC correctly, otherwise this can hit
> frequently.
>
> SUGGESTION (simpler way to say the same thing?)
> This will always be the case unless the standby_slot_names GUC is not
> correctly configured on the primary server.

It is not true. It will not hit this condition "always" but has higher
chances to hit it when standby_slot_names is not configured. I think
you meant 'unless the standby_slot_names GUC is correctly configured'.
I feel the current comment gives clear info (less confusing) and thus
I have not changed it for the time being. I can consider if I get more
comments there.

> 4.
> + /* User created slot with the same name exists, raise ERROR. */
>
> /User created/User-created/

Modified.

> ~~~
>
> 5. synchronize_slots, and also drop_obsolete_slots
>
> + /*
> + * Use shared lock to prevent a conflict with
> + * ReplicationSlotsDropDBSlots(), trying to drop the same slot while
> + * drop-database operation.
> + */
>
> (same code comment is in a couple of places)
>
> SUGGESTION (while -> during, etc.)
>
> Use a shared lock to prevent conflicting with
> ReplicationSlotsDropDBSlots() trying to drop the same slot during a
> drop-database operation.

Modified.

> ~~~
>
> 6. validate_parameters_and_get_dbname
>
> strcmp() just for the empty string "" might be overkill.
>
> 6a.
> + if (PrimarySlotName == NULL || strcmp(PrimarySlotName, "") == 0)
>
> SUGGESTION
> if (PrimarySlotName == NULL || *PrimarySlotName == '\0')
>
> ~~
>
> 6b.
> + if (PrimaryConnInfo == NULL || strcmp(PrimaryConnInfo, "") == 0)
>
> SUGGESTION
> if (PrimaryConnInfo == NULL || *PrimaryConnInfo == '\0')

Modified.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-01 Thread Amit Kapila
On Wed, Jan 31, 2024 at 10:40 AM Peter Smith  wrote:
>
> On Wed, Jan 31, 2024 at 2:18 PM Amit Kapila  wrote:
> >
> > >
> > > I think is correct to say all those *other* properties (create_slot,
> > > enabled, copy_data) are forced to false because those otherwise have
> > > default true values.
> > >
> >
> > So, won't when connect=false, the user has to explicitly provide such
> > values (create_slot, enabled, etc.) as false? If so, is using 'force'
> > strictly correct?
>
> Perhaps the original docs text could be worded differently; I think
> the word "force" here just meant setting connection=false
> forces/causes/makes those other options behave "as if" they had been
> set to false without the user explicitly doing anything to them.
>

Okay, I see your point. Let's remove the 'failover' from this part of
the sentence.

-- 
With Regards,
Amit Kapila.




Re: When extended query protocol ends?

2024-02-01 Thread Tatsuo Ishii
>> Hello Dave,
>>
>> > Tatsuo Ishii  writes:
>> >> Below is outputs from "pgproto" command coming with Pgpool-II.
>> >> (Lines starting "FE" represents a message from frontend to backend.
>> >> Lines starting "BE" represents a message from backend to frontend.)
>> >
>> >> FE=> Parse(stmt="", query="SET extra_float_digits = 3")
>> >> FE=> Bind(stmt="", portal="")
>> >> FE=> Execute(portal="")
>> >> FE=> Parse(stmt="", query="SET extra_float_digits = 3")
>> >> FE=> Bind(stmt="", portal="")
>> >> FE=> Execute(portal="")
>> >> FE=> Query (query="SET extra_float_digits = 3")
>> >> <= BE ParseComplete
>> >> <= BE BindComplete
>> >> <= BE CommandComplete(SET)
>> >> <= BE ParseComplete
>> >> <= BE BindComplete
>> >> <= BE CommandComplete(SET)
>> >> <= BE CommandComplete(SET)
>> >> <= BE ReadyForQuery(I)
>> >> FE=> Terminate
>> >
>> >> As you can see, two "SET extra_float_digits = 3" is sent in the
>> >> extended query protocol, then one "SET extra_float_digits = 3" follows
>> >> in the simple query protocol. No sync message is sent. However, I get
>> >> ReadyForQuery at the end. It seems the extended query protocol is
>> >> ended by a simple query protocol message instead of a sync message.
>> >
>> >> My question is, is this legal in terms of fronted/backend protocol?
>> >
>> > I think it's poor practice, at best.  You should end the
>> > extended-protocol query cycle before invoking simple query.
>>
>> From [1] I think the JDBC driver sends something like below if
>> autosave=always option is specified.
>>
>> "BEGIN READ ONLY" Parse/Bind/Eexecute (in the extended query protocol)
>> "SAVEPOINT PGJDBC_AUTOSAVE" (in the simple query protocol)
>>
>> It seems the SAVEPOINT is sent without finishing the extended query
>> protocol (i.e. without Sync message). Is it possible for the JDBC
>> driver to issue a Sync message before sending SAVEPOINT in simple
>> query protocol? Or you can send SAVEPOINT using the extended query
>> protocol.
>>
>> [1]
>> https://www.pgpool.net/pipermail/pgpool-general/2023-December/009051.html
>>
>>
> Hi Tatsuo,
> 
> Yes, it would be possible.
> 
> Can you create an issue on github? Issues · pgjdbc/pgjdbc (github.com)
> 

Sure.

https://github.com/pgjdbc/pgjdbc/issues/3107

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: [PoC/RFC] Multiple passwords, interval expirations

2024-02-01 Thread vignesh C
On Sat, 27 Jan 2024 at 07:18, vignesh C  wrote:
>
> On Tue, 10 Oct 2023 at 16:37, Gurjeet Singh  wrote:
> >
> > > On Mon, Oct 9, 2023 at 2:31 AM Gurjeet Singh  wrote:
> > > >
> > > > Next steps:
> > > > - Break the patch into a series of smaller patches.
> > > > - Add TAP tests (test the ability to actually login with these 
> > > > passwords)
> > > > - Add/update documentation
> > > > - Add more regression tests
> >
> > Please see attached the v4 of the patchset that introduces the notion
> > of named passwords slots, namely 'first' and 'second' passwords, and
> > allows users to address each of these passwords separately for the
> > purposes of adding, dropping, or assigning expiration times.
> >
> > Apart from the changes described by each patch's commit title, one
> > significant change since v3 is that now (included in v4-0002...patch)
> > it is not allowed for a role to have a mix of a types of passwords.
> > When adding a password, the patch ensures that the password being
> > added uses the same hashing algorithm (md5 or scram-sha-256) as the
> > existing password, if any.  Having all passwords of the same type
> > helps the server pick the corresponding authentication method during
> > connection attempt.
> >
> > The v3 patch also had a few bugs that were exposed by cfbot's
> > automatic run. All those bugs have now been fixed, and the latest run
> > on the v4 branch [1] on my private Git repo shows a clean run [1].
> >
> > The list of patches, and their commit titles are as follows:
> >
> > > v4-0001-...patch Add new columns to pg_authid
> > > v4-0002-...patch Update password verification infrastructure to handle 
> > > two passwords
> > > v4-0003-...patch Added SQL support for ALTER ROLE to manage two passwords
> > > v4-0004-...patch Updated pg_dumpall to support exporting a role's second 
> > > password
> > > v4-0005-...patch Update system views pg_roles and pg_shadow
> > > v4-0006-...patch Updated pg_authid catalog documentation
> > > v4-0007-...patch Updated psql's describe-roles meta-command
> > > v4-0008-...patch Added documentation for ALTER ROLE command
> > > v4-0009-...patch Added TAP tests to prove that a role can use two 
> > > passwords to login
> > > v4-0010-...patch pgindent run
> > > v4-0011-...patch Run pgperltidy on files changed by this patchset
> >
> > Running pgperltidy updated many perl files unrelated to this patch, so
> > in the last patch I chose to include only the one perl file that is
> > affected by this patchset.
>
> CFBot shows that the patch does not apply anymore as in [1]:
> === Applying patches on top of PostgreSQL commit ID
> 4d969b2f85e1fd00e860366f101fd3e3160aab41 ===
> === applying patch
> ./v4-0002-Update-password-verification-infrastructure-to-ha.patch
> ...
> patching file src/backend/libpq/auth.c
> Hunk #4 FAILED at 828.
> Hunk #5 succeeded at 886 (offset -2 lines).
> Hunk #6 succeeded at 907 (offset -2 lines).
> 1 out of 6 hunks FAILED -- saving rejects to file src/backend/libpq/auth.c.rej
>
> Please post an updated version for the same.

The patch which you submitted has been awaiting your attention for
quite some time now.  As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible.  Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.

Regards,
Vignesh




  1   2   >