Re: [HACKERS] Error while creating subscription when server is running in single user mode

2017-05-31 Thread Dilip Kumar
On Wed, May 31, 2017 at 2:20 PM, Michael Paquier
 wrote:
> Yeah, see 0e0f43d6 for example. A simple fix is to look at
> IsUnderPostmaster when creating, altering or dropping a subscription
> in subscriptioncmds.c.

Yeah, below patch fixes that.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


subscription_error.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017 : Proposal for predicate locking in gist index

2017-05-31 Thread Andrew Borodin
Hi, hackers!

2017-06-01 1:50 GMT+05:00 Kevin Grittner :
> > The main difference between b-tree and gist index while searching for a
> > target tuple is that in gist index, we can determine if there is a match or
> > not at any level of the index.
>
> Agreed.  A gist scan can fail at any level, but for that scan to
> have produced a different result due to insertion of an index entry,
> *some* page that the scan looked at must be modified.

Two things seem non-obvious to me.
First, I just do not know, can VACUUM erase page with predicate lock?
Right now, GiST never deletes pages, as far as I know, so this
question is only matter of future compatibility.

Second, when we are doing GiST inserts, we can encounter unfinished
split. That's not a frequent situation, but still. Should we skip
finishing split or should we add it to collision check too?

Best regards, Andrey Borodin, Octonica.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Tweaking tab completion for some backslash commands

2017-05-31 Thread Masahiko Sawada
Hi,

While using psql, I found that tab completion for subscription and
publication baskslash commands are not supported. After investigated
other backslash command that is not supported, I found \if, \elif,
\else, \endif and \?.
Attached patch tweaks tab completion for some backslash commands.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


tweak_tab_completion_for_backslash_command.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance bug in v10

2017-05-31 Thread David Rowley
On 1 June 2017 at 04:16, Teodor Sigaev  wrote:
> I found an example where v10 chooses extremely non-optimal plan:
> select
> i::int as a,
> i::int + 1 as b,
> 0 as c
> into t
> from
> generate_series(1,32) as i;
>
> create unique index i on t (c, a);
>
> explain analyze
> SELECT
> t1.a, t1.b,
> t2.a, t2.b,
> t3.a, t3.b,
> t4.a, t4.b,
> t5.a, t5.b,
> t6.a, t6.b
> /*
> ,
> t7.a, t7.b,
> t8.a, t8.b,
> t9.a, t9.b,
> t10.a, t10.b
> */
> FROM t T1
> LEFT OUTER JOIN t T2
> ON T1.b = T2.a AND T2.c = 0
> LEFT OUTER JOIN t T3
> ON T2.b = T3.a AND T3.c = 0
> LEFT OUTER JOIN t T4
> ON T3.b = T4.a AND T4.c = 0
> LEFT OUTER JOIN t T5
> ON T4.b = T5.a AND T5.c = 0
> LEFT OUTER JOIN t T6
> ON T5.b = T6.a AND T6.c = 0
> LEFT OUTER JOIN t T7
> ON T6.b = T7.a AND T7.c = 0
> LEFT OUTER JOIN t T8
> ON T7.b = T8.a AND T8.c = 0
> LEFT OUTER JOIN t T9
> ON T8.b = T9.a AND T9.c = 0
> LEFT OUTER JOIN t T10
> ON T9.b = T10.a AND T10.c = 0
> WHERE T1.c = 0 AND T1.a = 5
> ;

That's pretty unfortunate. It only chooses this plan due to lack of
any useful stats on the table. The planner thinks that a seqscan on t6
with Filter (c = 0) will return 1 row, which is not correct. In the
good plan t1 is the outer rel of the inner most loop. Here the filter
is c = 0 and a = 5, which *does* actually return only 1 row, which
means that all of the other nested loops only execute once, as
predicted.

This is all caused by get_variable_numdistinct() deciding that all
values are distinct because ntuples < DEFAULT_NUM_DISTINCT. I see that
if the example is increased to use 300 tuples instead of 32, then
that's enough for the planner to estimate 2 rows instead of clamping
to 1, and the bad plan does not look so good anymore since the planner
predicts that those nested loops need to be executed more than once.

I really think the planner is too inclined to take risks by nesting
Nested loops like this, but I'm not all that sure the best solution to
fix this, and certainly not for beta1.

So, I'm a bit unsure exactly how best to deal with this.  It seems
like we'd better make some effort, as perhaps this could be a case
that might occur when temp tables are used and not ANALYZED, but the
only way I can think to deal with it is not to favour unique inner
nested loops in the costing model.  The unfortunate thing about not
doing this is that the planner will no longer swap the join order of a
2-way join to put the unique rel on the inner side. This is evident by
the regression test failures caused by patching with the attached,
which changes the cost model for nested loops back to what it was
before unique joins.

My other line of thought is just not to bother doing anything about
this. There's plenty more queries you could handcraft to trick the
planner into generating a plan that'll blow up like this. Is this a
realistic enough one to bother accounting for? Did it come from a real
world case? else, how did you stumble upon it?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


dont_reduce_cost_of_inner_unique_nested_loops.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-05-31 Thread Dilip Kumar
On Wed, May 31, 2017 at 7:19 PM, Andres Freund  wrote:
> At the moment $subject doesn't allow parallelism, because copy.c's
> pg_plan_query() invocation doesn't set the CURSOR_OPT_PARALLEL_OK
> flag.
>
> To me that appears to be an oversight rather than intentional.  A
> somewhat annoying one at that, because it's not uncommong to use COPY to
> execute reports to a CSV file and such.
>
> Robert, am I missing a complication?
>
> I personally think we should fix this in 9.6 and 10, but I've to admit
> I'm not entirely impartial, because Citus hit this...

IMHO, For copy_to I don't see any problem in parallelizing it.  I just
tested by changing the cursorOption and it's working in parallel.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


copy_to_parallel.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender & parallelism

2017-05-31 Thread Andres Freund
On 2017-05-31 23:51:08 -0400, Peter Eisentraut wrote:
> I think the easiest and safest thing to do now is to just prevent
> parallel plans in the walsender.  See attached patch.  This prevents the
> hang in the select_parallel tests run under your new test setup.

I'm not quite sure I can buy this.  The lack of wired up signals has
more problems than just hurting parallelism.  In fact, the USR1 thing
seems like something that we actually should backpatch, rather than
defer to v11.  I think there's some fair arguments to be made that we
shouldn't do the refactoring right now - although I'm not sure about it
- but just not fixing the bugs seems like a bad plan.


> @@ -272,6 +273,7 @@ standard_planner(Query *parse, int cursorOptions, 
> ParamListInfo boundParams)
>*/
>   if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
>   IsUnderPostmaster &&
> + !am_walsender &&
>   dynamic_shared_memory_type != DSM_IMPL_NONE &&
>   parse->commandType == CMD_SELECT &&
>   !parse->hasModifyingCTE &&

If we were to go for this, surely this'd need a comment.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-05-31 Thread Peter Eisentraut
On 5/31/17 02:51, Noah Misch wrote:
> On Tue, May 30, 2017 at 01:30:35AM +, Noah Misch wrote:
>> On Thu, May 18, 2017 at 10:27:51PM -0400, Peter Eisentraut wrote:
>>> On 5/18/17 11:11, Noah Misch wrote:
 IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is past due for
 your status update.  Please reacquaint yourself with the policy on open 
 item
 ownership[1] and then reply immediately.  If I do not hear from you by
 2017-05-19 16:00 UTC, I will transfer this item to release management team
 ownership without further notice.
>>>
>>> There is no progress on this issue at the moment.  I will report again
>>> next Wednesday.
>>
>> IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
>> for your status update.  If I do not hear one from you by 2017-05-31 02:00
>> UTC, I will transfer this item to release management team ownership without
>> further notice.
> 
> This PostgreSQL 10 open item now needs a permanent owner.  Would any other
> committer like to take ownership?  If this role interests you, please read
> this thread and the policy linked above, then send an initial status update
> bearing a date for your subsequent status update.  If the item does not have a
> permanent owner by 2017-06-03 07:00 UTC, I will investigate feature removals
> that would resolve the item.

It seems I lost track of this item between all the other ones.  I will
continue to work on this item.  We have patches proposed and I will work
on committing them until Friday.

I think I now have updates posted on all my items.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender & parallelism

2017-05-31 Thread Craig Ringer
On 1 June 2017 at 11:51, Peter Eisentraut
 wrote:

> Unifying the signal handling and query processing further seems like a
> good idea, but the patches are pretty involved, so I suggest to put them
> into the next commit fest.

I had a quick look a the idea of just getting rid of walsenders as a
specific entity. Making them normal backends wherever possible. But it
starts running into trouble when you're allowing walsenders to stay
alive well into postmaster shutdown when normal backends are
terminated, so there's probably going to be a need for more ongoing
separation than I'd really like to stop walsenders doing things they
can't safely do during shutdown.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender & parallelism

2017-05-31 Thread Peter Eisentraut
On 5/29/17 22:01, Noah Misch wrote:
> On Tue, May 23, 2017 at 01:45:59PM -0400, Andres Freund wrote:
>> On May 23, 2017 1:42:41 PM EDT, Petr Jelinek  
>> wrote:
>>> Hi,
>>>
>>> so this didn't really move anywhere AFAICS, do we think the approach
>>> I've chosen is good or do we want to do something else here?
>>
>> Can you add it to the open items list?
> 
> [Action required within three days.  This is a generic notification.]

I have posted an update.  The next update will be Friday.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender & parallelism

2017-05-31 Thread Peter Eisentraut
On 5/23/17 13:57, Petr Jelinek wrote:
> On 23/05/17 19:45, Andres Freund wrote:
>>
>>
>> On May 23, 2017 1:42:41 PM EDT, Petr Jelinek  
>> wrote:
>>> Hi,
>>>
>>> so this didn't really move anywhere AFAICS, do we think the approach
>>> I've chosen is good or do we want to do something else here?
>>
>> Can you add it to the open items list?
>>
> 
> Done

I think the easiest and safest thing to do now is to just prevent
parallel plans in the walsender.  See attached patch.  This prevents the
hang in the select_parallel tests run under your new test setup.

Unifying the signal handling and query processing further seems like a
good idea, but the patches are pretty involved, so I suggest to put them
into the next commit fest.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 26002c4e58647c67cce97e66b34b489bb838e39a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 31 May 2017 23:27:13 -0400
Subject: [PATCH] Prevent parallel query in walsender

Parallel query in walsender currently hangs, because walsender does not
have the full SIGUSR1 signal handler to handle this.  It's safer and
easier to prevent using a parallel plan for now.  Eventually, the signal
handling in walsender should be unified with regular backends.

Reported-by: Andres Freund 
---
 src/backend/optimizer/plan/planner.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 40cb79d4cd..bb921ba29c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -50,6 +50,7 @@
 #include "parser/analyze.h"
 #include "parser/parsetree.h"
 #include "parser/parse_agg.h"
+#include "replication/walsender.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/dsm_impl.h"
 #include "utils/rel.h"
@@ -272,6 +273,7 @@ standard_planner(Query *parse, int cursorOptions, 
ParamListInfo boundParams)
 */
if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
IsUnderPostmaster &&
+   !am_walsender &&
dynamic_shared_memory_type != DSM_IMPL_NONE &&
parse->commandType == CMD_SELECT &&
!parse->hasModifyingCTE &&
-- 
2.13.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tap tests on older branches fail if concurrency is used

2017-05-31 Thread Craig Ringer
On 1 June 2017 at 08:15, Andres Freund  wrote:
> Hi,
>
> when using
> $ cat ~/.proverc
> -j9
>
> some tests fail for me in 9.4 and 9.5.  E.g. src/bin/script's tests
> yields a lot of fun like:
> $ (cd ~/build/postgres/9.5-assert/vpath/src/bin/scripts/ && make check)
> ...
> # LOG:  received immediate shutdown request
> # WARNING:  terminating connection because of crash of another server process
> # DETAIL:  The postmaster has commanded this server process to roll back the 
> current transaction and exit, because another server process exited 
> abnormally and possibly corrupted shared memory.
> # HINT:  In a moment you should be able to reconnect to the database and 
> repeat your command.
> ...
>
> it appears as if various tests are trampling over each other.

None of those scripts use PostgresNode, which I thought was added in
9.5, but apparently was actually introduced in 9.6. They do all their
own setup/teardown using TestLib.pm routines. TestLib uses a unique
tempdir for each test run, sets it as the unix socket directory, and
disables listening on tcp, so the most obvious conflict is hidden.

The immediate problem appears to be that they all use
tmp_check/postmaster.log . So anything that examines the logs gets
confused by seeing some other postgres instance's logs, or a mixture,
trampling everywhere.

I'll be surprised if there aren't other problems though. Rather than
trying to fix it all up, this seems like a good argument for
backporting the updated suite from 9.6 or pg10, with PostgresNode etc.
I already have a working tree with that done to use src/test/recovery
in 9.5, but haven't updated src/bin/scripts etc yet.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-05-31 Thread Masahiko Sawada
On Thu, May 25, 2017 at 4:14 AM, Petr Jelinek
 wrote:
> Hi,
>
> I finally had time to properly analyze this, and turns out we've been
> all just trying to fix symptoms and the actual problems.
>
> All the locking works just fine the way it is in master. The issue with
> deadlock with apply comes from the wrong handling of the SIGTERM in the
> apply (we didn't set InterruptPending). I changed the SIGTERM handler in
> patch 0001 just to die which is actually the correct behavior for apply
> workers. I also moved the connection cleanup code to the
> before_shmem_exit callback (similar to walreceiver) and now that part
> works correctly.
>
> The issue with orphaned sync workers is actually two separate issues.
> First, due to thinko we always searched for sync worker in
> wait_for_sync_status_change instead of searching for opposite worker as
> was the intention (i.e. sync worker should search for apply and apply
> should search for sync). Thats fixed by 0002. And second, we didn't
> accept any invalidation messages until the whole sync process finished
> (because it flattens all the remote transactions in the single one) so
> sync worker didn't learn about subscription changes/drop until it has
> finished, which I now fixed in 0003.
>
> There is still outstanding issue that sync worker will keep running
> inside the long COPY because the invalidation messages are also not
> processed until it finishes but all the original issues reported here
> disappear for me with the attached patches applied.
>

These patches conflict with current HEAD, I attached updated version patches.

Also, the issue that sync worker will keep running inside the long
COPY can lead the another problem that the user could not create new
subscription with some workers due to not enough free logical
replication worker slots until the long COPY finishes. Attached 0004
patch is the updated version patch I submitted before.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


0001-Fix-signal-handling-in-logical-workers.patch
Description: Binary data


0002-Make-tablesync-worker-exit-when-apply-dies-while-it-.patch
Description: Binary data


0003-Receive-invalidation-messages-correctly-in-tablesync.patch
Description: Binary data


0004-Wait-for-table-sync-worker-to-finish-when-apply-work.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-05-31 Thread Amit Langote
On 2017/06/01 11:13, Mark Dilger wrote:
>> On May 31, 2017, at 6:32 PM, Amit Langote  
>> wrote:
>>
>> On 2017/06/01 4:50, Robert Haas wrote:
>>> On Wed, May 31, 2017 at 3:40 PM, Mark Dilger  
>>> wrote:
 recent changes have introduced the :location field to the partboundspec
 in pg_catalog.pg_class.  This means that if two identical tables with
 identical partitioning scheme are created, but one is done before a change
 to gram.y, and the other after a change to gram.y, the relpartbound fields
 for those two tables could show up as different.

 Can we perhaps remove the :location field here?  If not, could somebody
 please defend why this belongs in the catalog entries for the table?  Sorry
 if I am missing something obvious...
>>
>> I now think it's kind of annoying too, although not exactly unprecedented
>> as others have pointed out.  As you well might know, the location field in
>> the parse node is to position the error cursor at the correct point in the
>> erring command text. 
> 
> I knew about the location field already, but not that it was already occurring
> elsewhere in the catalogs.  I just never paid attention to any of the columns
> that are doing so.  Alvaro's criticisms of my complaint were quite 
> informative.
> (Thanks, Alvaro.)  I think standardizing the location field to -1 at some 
> point
> in all such places would be a good idea, though I am not sufficiently 
> qualified
> to say if that should be in 10 or 11, nor whether doing so might cause
> backwards compatibility issues, nor whether those would be too much cost
> to justify the changes.

Setting the location field of parse nodes to -1 before writing into the
catalog for *all* node types might defeat the purpose of writing the
actual value, which as written in the header comment of readfuncs.c is:

 *Parse location fields are written out by outfuncs.c, but only for
 *possible debugging use.  When reading a location field, we discard
 *the stored value and set the location field to -1 (ie, "unknown").

So apparently, seeing the actual location value might be useful to some.
There may not be that many users who *do* use it for anything though.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create subscription with `create_slot=false` and incorrect slot name

2017-05-31 Thread Peter Eisentraut
On 5/31/17 09:40, Robert Haas wrote:
> On Tue, May 30, 2017 at 3:01 PM, Peter Eisentraut
>  wrote:
>> On 5/25/17 17:26, Peter Eisentraut wrote:
>>> Another way to fix this particular issue is to not verify the
>>> replication slot name before doing the drop.  After all, if the name is
>>> not valid, then you can also just report that it doesn't exist.
>>
>> Here is a possible patch along these lines.
> 
> I don't see how this solves the problem.  Don't you still end up with
> an error message telling you that you can't drop the subscription, and
> no guidance as to how to fix it?

Well, the idea was to make the error message less cryptic.

But I notice that there is really little documentation about this.  So
how about the attached documentation patch as well?

As mentioned earlier, if we want to do HINT messages, that will be a bit
more involved and probably PG11 material.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 3d89b959794abe1bd3addeb9c7c1340187a3cef2 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 31 May 2017 22:35:33 -0400
Subject: [PATCH] doc: Add note that DROP SUBSCRIPTION drops replication slot

Add some information about what to do when this fails.
---
 doc/src/sgml/ref/drop_subscription.sgml | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/doc/src/sgml/ref/drop_subscription.sgml 
b/doc/src/sgml/ref/drop_subscription.sgml
index 4f34a35eef..42068d617b 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -74,6 +74,28 @@ Parameters
  
 
  
+  Notes
+
+  
+   When dropping a subscription that is associated with a replication slot on
+   the remote host (the normal state), DROP SUBSCRIPTION
+   will connect to the remote host and try to drop the replication slot as
+   part of its operation.  This is necessary so that the resources allocated
+   for the subscription on the remote host are released.  If this fails,
+   either because the remote host is not reachable or because the remote
+   replication slot cannot be dropped or does not exist or never existed,
+   the DROP SUBSCRIPTION command will fail.  To proceed in
+   this situation, disassociate the subscription from the replication slot by
+   executing ALTER SUBSCRIPTION ... SET (slot_name = NONE).
+   After that, DROP SUBSCRIPTION will no longer attempt any
+   actions on a remote host.  Note that if the remote replication slot still
+   exists, it should then be dropped manually; otherwise it will continue to
+   reserve WAL and might eventually cause the disk to fill up.  See
+   also .
+  
+ 
+
+ 
   Examples
 
   
-- 
2.13.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP: allow overriding PostgresNode in get_new_node

2017-05-31 Thread Craig Ringer
On 31 May 2017 at 08:43, Craig Ringer  wrote:
> Hi all
>
> More and more I'm finding it useful to extend PostgresNode for project
> specific helper classes. But PostgresNode::get_new_node is a factory
> that doesn't provide any mechanism for overriding, so you have to
> create a PostgresNode then re-bless it as your desired subclass. Ugly.
>
> The attached patch allows an optional second argument, a class name,
> to be passed to PostgresNode::get_new_node . It's instantiated instead
> of PostgresNode if supplied. Its 'new' constructor must take the same
> arguments.

Note that you can achieve the same effect w/o patching
PostgresNode.pm, albeit in a somewhat ugly manner, by re-blessing the
returned object.

sub get_new_mywhatever_node {
my $self = PostgresNode::get_new_node($name);
$self = bless $self, 'MyWhateverNode';
return $self;
}

so this would be cosmetically nice, but far from functionally vital.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Alter subscription..SET - NOTICE message is coming for table which is already removed

2017-05-31 Thread Peter Eisentraut
On 5/30/17 13:25, Masahiko Sawada wrote:
> I think this cause is that the relation status entry could be deleted
> by ALTER SUBSCRIPTION REFRESH before corresponding table sync worker
> starting. Attached patch fixes issues reported on this thread so far.

This looks like a reasonable change, but it conflicts with the change
discussed on "logical replication - still unstable after all these
months".  I think I'll deal with that one first.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - still unstable after all these months

2017-05-31 Thread Peter Eisentraut
On 5/31/17 05:16, Petr Jelinek wrote:
> I've been running tests on this overnight on another machine where I was
> able to reproduce  the original issue within few runs (once I found what
> causes it) and so far looks good.

I'll give people another day or so to test this before committing.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-05-31 Thread Mark Dilger

> On May 31, 2017, at 6:32 PM, Amit Langote  
> wrote:
> 
> On 2017/06/01 4:50, Robert Haas wrote:
>> On Wed, May 31, 2017 at 3:40 PM, Mark Dilger  wrote:
>>> recent changes have introduced the :location field to the partboundspec
>>> in pg_catalog.pg_class.  This means that if two identical tables with
>>> identical partitioning scheme are created, but one is done before a change
>>> to gram.y, and the other after a change to gram.y, the relpartbound fields
>>> for those two tables could show up as different.
>>> 
>>> Can we perhaps remove the :location field here?  If not, could somebody
>>> please defend why this belongs in the catalog entries for the table?  Sorry
>>> if I am missing something obvious...
> 
> I now think it's kind of annoying too, although not exactly unprecedented
> as others have pointed out.  As you well might know, the location field in
> the parse node is to position the error cursor at the correct point in the
> erring command text. 

I knew about the location field already, but not that it was already occurring
elsewhere in the catalogs.  I just never paid attention to any of the columns
that are doing so.  Alvaro's criticisms of my complaint were quite informative.
(Thanks, Alvaro.)  I think standardizing the location field to -1 at some point
in all such places would be a good idea, though I am not sufficiently qualified
to say if that should be in 10 or 11, nor whether doing so might cause
backwards compatibility issues, nor whether those would be too much cost
to justify the changes.

> By the way, didn't you first have to come across that?  The commit
> (80f583ffe93) that introduced location field to partboundspec also bumped
> up the catalog version, so you would have to reinitialize the database
> directory anyway.

I don't have my work in production yet.  Each time I merge changes into
my repository, I run "make check-world", and that of course re-initializes
the data directories for the test databases.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication origins and timelines

2017-05-31 Thread Craig Ringer
On 1 June 2017 at 09:23, Andres Freund  wrote:

> Even if we decide this is necessary, I *strongly* suggest trying to get
> the existing standby decoding etc wrapped up before starting something
> nontrival afresh.

Speaking of such, I had a thought about how to sync logical slot state
to physical replicas without requiring all logical downstreams to know
about and be able to connect to all physical replicas. Interested in
your initial reaction. Basically, enlist the walreceiver's help.

Extend the walsender so in physical rep mode it periodically writes
the upstream's logical slot state into the stream as a message with
special lsn 0/0. Then the walreceiver uses that to make decoding calls
on the downstream to advance the downstream logical slots to the new
confirmed_flush_lsn, or hands the info off to a helper proc that does
it. It could set up a decoding context and do it via a proper decoding
session, discarding output, and later we could probably optimise that
decoding session to do even less work than ReorderBufferSkip()ing
xacts.

The alternative at this point since we nixed writing logical slot
state to WAL seems to be a bgworker on the upstream that periodically
writes logical slot state into generic WAL messages in a special
table, then another on the downstream that processes the table and
makes appropriate decoding calls to advance the downstream slot state.
(Safely, not via directly setting catalog_xmin etc). Which is pretty
damn ugly, but has the advantage that it'd work for PITR restores,
unlike the walsender/walreceiver based approach. Failover slots in
extension-space, basically.

I'm really, really not sold on all logical downstreams having to know
about and be able to connect to all physical standbys of the upstream
to maintain slots on them. Some kind of solution that runs entirely on
the standby will be needed. It's more a question of whether it's
something built-in, easy, and nice, or some out of tree extension.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "create publication..all tables" ignore 'partition not supported' error

2017-05-31 Thread Amit Langote
On 2017/06/01 10:26, Peter Eisentraut wrote:
> On 5/31/17 02:17, Kuntal Ghosh wrote:
>> On Wed, May 31, 2017 at 12:58 AM, Masahiko Sawada  
>> wrote:
>>>
>>> I'd say we can fix this issue by just changing the query. Attached
>>> patch changes the query so that it can handle publication name
>>> correctly, the query gets complex, though.
>>>
>> In is_publishable_class function, there are four conditions to decide
>> whether this is a publishable class or not.
>>
>> 1. relkind == RELKIND_RELATION
>> 2. IsCatalogClass()
>> 3. relpersistence == 'p'
>> 4. relid >= FirstNormalObjectId /* Skip tables created during initdb */
>>
>> I think the modified query should have a check for the fourth condition as 
>> well.
> 
> The query should be fixed like in the attached patch.
> pg_get_publication_tables() ends up calling is_publishable_class()
> internally.

Looks good to me.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication origins and timelines

2017-05-31 Thread Andres Freund
On 2017-05-31 21:27:56 -0400, Stephen Frost wrote:
> Craig,
> 
> * Craig Ringer (cr...@2ndquadrant.com) wrote:
> > TL;DR: replication origins track LSN without timeline. This is
> > ambiguous when physical failover is present since /
> > can now represent more than one state due to timeline forks with
> > promotions. Replication origins should track timelines so we can tell
> > the difference, I propose to patch them accordingly for pg11.
> 
> Uh, TL;DR, wow?  Why isn't this something which needs to be addressed
> before PG10 can be released?

Huh?  Slots are't present on replicas, ergo there's no way for the whole
issue to occur in v10.


> The further comments in your email seem to state that logical
> replication will just fail if a replica is promoted.  While not ideal,
> that might barely reach the point of it being releasable, but turns it
> into a feature that I'd have a really hard time recommending to
> anyone,

Meh^10


> and are we absolutely sure that there aren't any cases where there might
> be an issue of undetected promotion, leading to the complications which
> you describe?

Yes, unless you manipulate things by hand, copying files around or such.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication origins and timelines

2017-05-31 Thread Craig Ringer
On 1 June 2017 at 09:36, Andres Freund  wrote:
> On 2017-05-31 21:33:26 -0400, Stephen Frost wrote:
>> > This only starts becoming an issue once logical replication slots can
>> > exist on replicas and be maintained to follow the master's slot state.
>> > Which is incomplete in Pg10 (not exposed to users) but I plan to
>> > finish getting in for pg11, making this a possible issue to be
>> > addressed.
>>
>> Fair enough.  I'm disappointed that we ended up with that as the
>> solution for PG10
>
> This has widely been debated, and it's not exactly new that development
> happens incrementally, so I don't have particularly much sympathy for
> that POV.

Yeah. Even if we'd had completed support for decoding on standby,
there's no way we could've implemented the required gymnastics for
getting logical replication to actually use it to support physical
failover for pg10, so it was always going to land in pg11.

This is very much a "how do we do it right when we do do it" topic,
not a pg10 issue.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication origins and timelines

2017-05-31 Thread Stephen Frost
Craig,

* Craig Ringer (cr...@2ndquadrant.com) wrote:
> On 1 June 2017 at 09:27, Stephen Frost  wrote:
> > * Craig Ringer (cr...@2ndquadrant.com) wrote:
> >> TL;DR: replication origins track LSN without timeline. This is
> >> ambiguous when physical failover is present since /
> >> can now represent more than one state due to timeline forks with
> >> promotions. Replication origins should track timelines so we can tell
> >> the difference, I propose to patch them accordingly for pg11.
> >
> > Uh, TL;DR, wow?  Why isn't this something which needs to be addressed
> > before PG10 can be released?  I hope I'm missing something that makes
> > the current approach work in PG10, or that there's some reason that this
> > isn't a big deal for PG10, but I'd like a bit of info as to why that's
> > the case, if it is.
> 
> In Pg 10, if you promote a physical replica then logical replication
> falls apart entirely and stops working. So there's no corruption
> hazard because it just ... stops.

I see.

> This only starts becoming an issue once logical replication slots can
> exist on replicas and be maintained to follow the master's slot state.
> Which is incomplete in Pg10 (not exposed to users) but I plan to
> finish getting in for pg11, making this a possible issue to be
> addressed.

Fair enough.  I'm disappointed that we ended up with that as the
solution for PG10, but so be it, the main thing is that we avoid any
corruption risk.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Replication origins and timelines

2017-05-31 Thread Stephen Frost
Andres,

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-05-31 21:33:26 -0400, Stephen Frost wrote:
> > > This only starts becoming an issue once logical replication slots can
> > > exist on replicas and be maintained to follow the master's slot state.
> > > Which is incomplete in Pg10 (not exposed to users) but I plan to
> > > finish getting in for pg11, making this a possible issue to be
> > > addressed.
> > 
> > Fair enough.  I'm disappointed that we ended up with that as the
> > solution for PG10
> 
> This has widely been debated, and it's not exactly new that development
> happens incrementally, so I don't have particularly much sympathy for
> that POV.

I do understand that, of course, but hadn't quite realized yet that
we're talking only about replication slots on replicas.  Apologies for
the noise.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Replication origins and timelines

2017-05-31 Thread Andres Freund
On 2017-05-31 21:33:26 -0400, Stephen Frost wrote:
> > This only starts becoming an issue once logical replication slots can
> > exist on replicas and be maintained to follow the master's slot state.
> > Which is incomplete in Pg10 (not exposed to users) but I plan to
> > finish getting in for pg11, making this a possible issue to be
> > addressed.
> 
> Fair enough.  I'm disappointed that we ended up with that as the
> solution for PG10

This has widely been debated, and it's not exactly new that development
happens incrementally, so I don't have particularly much sympathy for
that POV.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication origins and timelines

2017-05-31 Thread Craig Ringer
On 1 June 2017 at 09:23, Andres Freund  wrote:
> Hi,
>
> On 2017-06-01 09:12:04 +0800, Craig Ringer wrote:
>> TL;DR: replication origins track LSN without timeline. This is
>> ambiguous when physical failover is present since /
>> can now represent more than one state due to timeline forks with
>> promotions. Replication origins should track timelines so we can tell
>> the difference, I propose to patch them accordingly for pg11.
>
> I'm not quite convinced that this should be tracked at the origin level.
> If you fail over physically, shouldn't we also reconfigure logical
> replication?
>
> Even if we decide this is necessary, I *strongly* suggest trying to get
> the existing standby decoding etc wrapped up before starting something
> nontrival afresh.

Yeah, I'm not thinking of leaping straight to a patch before we've got
the rep on standby stuff nailed down. I just wanted to raise early
discussion to make sure it's not entirely the wrong path and/or
totally hopeless for core.

Logical decoding output plugins would need to keep track of the
timeline and send an extra message informing the downstream of a
timeline change whenever they see a new timeline. Or include it in all
messages (see: extra overhead). Since we don't stop a decoding session
when we hit a timeline boundary and force re-connection. (Nor can we,
since at some point our restart_lsn will be on the old timeline but
the first commits will be on the new timeline). I'll need to think
about if/how the decoding plugin can reliably do that.

>> Take master A, its physical replica B, and logical decoding client X
>> streaming changes from A. B is lagging. A is at lsn 1/1000, B is only
>> at 1/500. C has replicated from A up to 1/1000, when A fails. We
>> promote B to replace A. Now C connects to B, and requests to resume at
>> LSN 1/1000.
>
> Wouldn't it be better to solve this by querying the new master's
> timeline history, and checking whether the current replay point is
> pre/post fork?

That could work.

The decoding client would need to track the last-commit timeline in
its own metadata if we're not letting it put it in the replication
origin. Manageable, if awkward.

Clients would need to know how to fetch and parse timeline history
files, which is an irritating thing for every decoding client that
wants to support failover to have to support. But I guess it's
manageable, if not friendly. And non-Pg-based downstreams would have
to do it anyway.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication origins and timelines

2017-05-31 Thread Stephen Frost
Andres,

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-05-31 21:27:56 -0400, Stephen Frost wrote:
> > Uh, TL;DR, wow?  Why isn't this something which needs to be addressed
> > before PG10 can be released?
> 
> Huh?  Slots are't present on replicas, ergo there's no way for the whole
> issue to occur in v10.

O, ok, now that makes more sense, not sure how I missed that.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Replication origins and timelines

2017-05-31 Thread Craig Ringer
On 1 June 2017 at 09:27, Stephen Frost  wrote:
> Craig,
>
> * Craig Ringer (cr...@2ndquadrant.com) wrote:
>> TL;DR: replication origins track LSN without timeline. This is
>> ambiguous when physical failover is present since /
>> can now represent more than one state due to timeline forks with
>> promotions. Replication origins should track timelines so we can tell
>> the difference, I propose to patch them accordingly for pg11.
>
> Uh, TL;DR, wow?  Why isn't this something which needs to be addressed
> before PG10 can be released?  I hope I'm missing something that makes
> the current approach work in PG10, or that there's some reason that this
> isn't a big deal for PG10, but I'd like a bit of info as to why that's
> the case, if it is.

In Pg 10, if you promote a physical replica then logical replication
falls apart entirely and stops working. So there's no corruption
hazard because it just ... stops.

This only starts becoming an issue once logical replication slots can
exist on replicas and be maintained to follow the master's slot state.
Which is incomplete in Pg10 (not exposed to users) but I plan to
finish getting in for pg11, making this a possible issue to be
addressed.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-05-31 Thread Amit Langote
On 2017/06/01 4:50, Robert Haas wrote:
> On Wed, May 31, 2017 at 3:40 PM, Mark Dilger  wrote:
>> recent changes have introduced the :location field to the partboundspec
>> in pg_catalog.pg_class.  This means that if two identical tables with
>> identical partitioning scheme are created, but one is done before a change
>> to gram.y, and the other after a change to gram.y, the relpartbound fields
>> for those two tables could show up as different.
>>
>> Can we perhaps remove the :location field here?  If not, could somebody
>> please defend why this belongs in the catalog entries for the table?  Sorry
>> if I am missing something obvious...

I now think it's kind of annoying too, although not exactly unprecedented
as others have pointed out.  As you well might know, the location field in
the parse node is to position the error cursor at the correct point in the
erring command text.  Although, the node structure that helps do that and
one that we need to store into the catalog do not have to be same, as
Robert suggests in his reply.  Doing that division will again have to
break the catalog though.

By the way, didn't you first have to come across that?  The commit
(80f583ffe93) that introduced location field to partboundspec also bumped
up the catalog version, so you would have to reinitialize the database
directory anyway.

> Yeah, that probably wasn't a good decision, although calling a
> decision might be giving it too much credit.  I think the easiest
> thing to do here would be to change transformPartitionBound() to set
> result_spec->location to -1, although maybe a better idea would be to
> have two different structures -- one that represents the partition
> bound specification *before* parse analysis and another that
> represents it *after* parse analysis, rather than reusing the same
> structure for both.  Then again, maybe making two different node types
> for this is overkill.  Not sure.  Opinions?

I find the two node structures idea interesting, though it would require
breaking the catalog again.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication origins and timelines

2017-05-31 Thread Stephen Frost
Craig,

* Craig Ringer (cr...@2ndquadrant.com) wrote:
> TL;DR: replication origins track LSN without timeline. This is
> ambiguous when physical failover is present since /
> can now represent more than one state due to timeline forks with
> promotions. Replication origins should track timelines so we can tell
> the difference, I propose to patch them accordingly for pg11.

Uh, TL;DR, wow?  Why isn't this something which needs to be addressed
before PG10 can be released?  I hope I'm missing something that makes
the current approach work in PG10, or that there's some reason that this
isn't a big deal for PG10, but I'd like a bit of info as to why that's
the case, if it is.

The further comments in your email seem to state that logical
replication will just fail if a replica is promoted.  While not ideal,
that might barely reach the point of it being releasable, but turns it
into a feature that I'd have a really hard time recommending to anyone,
and are we absolutely sure that there aren't any cases where there might
be an issue of undetected promotion, leading to the complications which
you describe?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] "create publication..all tables" ignore 'partition not supported' error

2017-05-31 Thread Peter Eisentraut
On 5/31/17 02:17, Kuntal Ghosh wrote:
> On Wed, May 31, 2017 at 12:58 AM, Masahiko Sawada  
> wrote:
>>
>> I'd say we can fix this issue by just changing the query. Attached
>> patch changes the query so that it can handle publication name
>> correctly, the query gets complex, though.
>>
> In is_publishable_class function, there are four conditions to decide
> whether this is a publishable class or not.
> 
> 1. relkind == RELKIND_RELATION
> 2. IsCatalogClass()
> 3. relpersistence == 'p'
> 4. relid >= FirstNormalObjectId /* Skip tables created during initdb */
> 
> I think the modified query should have a check for the fourth condition as 
> well.

The query should be fixed like in the attached patch.
pg_get_publication_tables() ends up calling is_publishable_class()
internally.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 4a93e9b8432661724e9fd400c8f267dd09d3b513 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 31 May 2017 21:23:06 -0400
Subject: [PATCH] psql: Fix display of whether table is part of publication

If a FOR ALL TABLES publication was present, \d of a table would claim
for each table that it was part of the publication, even for tables that
are ignores, such as system tables and unlogged tables.  Fix the query
by using the function pg_get_publication_tables(), which was intended for
this purpose.

Reported-by: tushar 
---
 src/bin/psql/describe.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 3e542f7b1d..f1c3d9b7e0 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2537,10 +2537,9 @@ describeOneTableDetails(const char *schemaname,
{
printfPQExpBuffer(,
  "SELECT pub.pubname\n"
- " FROM 
pg_catalog.pg_publication pub\n"
- " LEFT JOIN 
pg_catalog.pg_publication_rel pr\n"
- "  ON (pr.prpubid 
= pub.oid)\n"
- "WHERE pr.prrelid = 
'%s' OR pub.puballtables\n"
+ " FROM 
pg_catalog.pg_publication pub,\n"
+ "  
pg_catalog.pg_get_publication_tables(pub.pubname)\n"
+ "WHERE relid = '%s'\n"
  "ORDER BY 1;",
  oid);
 
-- 
2.13.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication origins and timelines

2017-05-31 Thread Andres Freund
Hi,

On 2017-06-01 09:12:04 +0800, Craig Ringer wrote:
> TL;DR: replication origins track LSN without timeline. This is
> ambiguous when physical failover is present since /
> can now represent more than one state due to timeline forks with
> promotions. Replication origins should track timelines so we can tell
> the difference, I propose to patch them accordingly for pg11.

I'm not quite convinced that this should be tracked at the origin level.
If you fail over physically, shouldn't we also reconfigure logical
replication?

Even if we decide this is necessary, I *strongly* suggest trying to get
the existing standby decoding etc wrapped up before starting something
nontrival afresh.


> Why?
> 
> Take master A, its physical replica B, and logical decoding client X
> streaming changes from A. B is lagging. A is at lsn 1/1000, B is only
> at 1/500. C has replicated from A up to 1/1000, when A fails. We
> promote B to replace A. Now C connects to B, and requests to resume at
> LSN 1/1000.

Wouldn't it be better to solve this by querying the new master's
timeline history, and checking whether the current replay point is
pre/post fork?

I'm more than bit doubtful that adding more overhead to every relevant
record is worth it here.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Replication origins and timelines

2017-05-31 Thread Craig Ringer
Hi all

TL;DR: replication origins track LSN without timeline. This is
ambiguous when physical failover is present since /
can now represent more than one state due to timeline forks with
promotions. Replication origins should track timelines so we can tell
the difference, I propose to patch them accordingly for pg11.

-

When replication origins were introduced, they deliberately left out
tracking of the upstream node's timeline. Logical decoding couldn't
follow a timeline switch anyway, and replicas (still) have no facility
for logical decoding so everything completely breaks on promotion of a
physical replica.

I'm working on fixing that so that logical decoding and logical
replication integrates properly with physical replication and
failover. But when that works we'll face the same problem in logical
rep that timelines were introduced to solve for physical rep.

To prevent undetected misreplication we'll need to keep track of the
timeline of the last-replicated LSN in our downstream replication
origin. So I propose to add a timeline field to replication origins
for pg11.

Why?

Take master A, its physical replica B, and logical decoding client X
streaming changes from A. B is lagging. A is at lsn 1/1000, B is only
at 1/500. C has replicated from A up to 1/1000, when A fails. We
promote B to replace A. Now C connects to B, and requests to resume at
LSN 1/1000.

If B has since done enough work for its insert position to pass
1/1000, C will completely skip whatever B did between 1/500 and
1/1000, thinking (incorrectly) that it already replayed it. And it
will have *extra data* from A from the 1/500 to 1/1000 range that B
lost. It'll pick up from B's 1/1000 and try to apply that on top of
A's 1/1000 state, potentially leading to a mangled mess.

In physical rep this would lead to serious data corruption and
crashes. In logical rep it'll most likely lead to conflicts, apply
errors, inconsistent data, broken FKs, etc. It could be drastic, or
quite subtle, depending on app and workload.

But we really should still detect it. To do that, we need to remember
that our last replay position was (1/1000, 1) . And when we request to
start replay from 1/1000 at timeline 1 on B, it'll ERROR, telling us
that its timeline 1 ends at 1/500.

We could still *choose* to continue as if all was well, but by default
we'll detect the error.

But we can't do that unless replication origins on the downstream can
track the timeline.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-05-31 Thread Stephen Frost
David,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Tue, May 30, 2017 at 8:41 PM, Stephen Frost  wrote:
> > * David G. Johnston (david.g.johns...@gmail.com) wrote:
> > > On Fri, May 26, 2017 at 7:47 AM, Stephen Frost 
> > wrote:
> > > > * Robins Tharakan (thara...@gmail.com) wrote:
> > > > > Attached is a patch adds a --no-comments argument to pg_dump to skip
> > > > > generation of COMMENT statements when generating a backup. This is
> > > > crucial
> > > > > for non-superusers to restore a database backup in a Single
> > Transaction.
> > > > > Currently, this requires one to remove COMMENTs via scripts, which is
> > > > > inelegant at best.
> > > >
> > > > Having --no-comments seems generally useful to me, in any case.
> > >
> > > I​t smacks of being excessive to me.
> > > ​
> >
> > I have a hard time having an issue with an option that exists to exclude
> > a particular type of object from being in the dump.
> 
> ​Excessive with respect to the problem at hand.

Fair enough.  I tend to agree with that sentiment.

> A single comment in the
> dump is unable to be restored.  Because of that we are going to require
> people to omit every comment in their database.  The loss of all that
> information is in excess of what is required to solve the stated problem
> which is how I was thinking of excessive.

That would be most unfortunate, I agree.

> I agree that the general idea of allowing users to choose to include or
> exclude comments is worth discussion along the same lines of large objects
> and privileges.

Good, then we can at least consider this particular feature as being one
we're generally interested in and move forward with it, even if we also,
perhaps, come up with a better solution to the specific issue at hand.

> > > > CREATE EXTENSION IF NOT EXISTS plpgsql ... COMMENT blah;
> > >
> > > ​A less verbose way to add comments to objects would be nice but we have
> > an
> > > immediate problem that we either need to solve or document a best
> > practice
> > > for.
> >
> > The above would be a solution to the immediate problem in as much as
> > adding COMMENT IF NOT EXISTS would be.
> 
> ​I believe it would take a lot longer, possibly even until 12, to get the
> linked comment feature committed compared ​to committing COMMENT IF NOT
> EXISTS or some variation (or putting in a hack for that matter).

Perhaps, but I'm not convinced that such speculation really helps move
us forward.

> > > COMMENT IF NOT EXISTS has been brought up but it doesn't actually map to
> > > what seems to me is the underlying problem...that people don't want a
> > > non-functional (usually...) aspect preventing successful restoration.
> >
> > I tend to disagree with this characterization- I'm of the opinion that
> > people are, rightly, confused as to why we bother to try and add a
> > COMMENT to an object which we didn't actually end up creating (as it
> > already existed), and then throw an error on it to boot.
> 
> My characterization does actually describe the current system though.

I'm not entirely sure what I was getting at above, to be honest, but I
believe we're generally on the same page here.  I certainly agree that
users don't expect a pg_dump to not be able to be successfully restored.
What I may have been getting at is simply that it's not about a lack of
COMMENT IF NOT EXISTS, in an ideal world, but perhaps that's what we
need to solve this particular issue, for now.

> While I won't doubt that people do hold your belief it is an underlying
> mis-understanding as to how PostgreSQL works since comments aren't, as you
> say below, actual attributes but rather objects in their own right.  I
> would love to have someone solve the systemic problem here.  But the actual
> complaint can probably be addressed without it.

Right, I tend to follow your line of thought here.

> >   Were pg_dump a
> > bit more intelligent of an application, it would realize that once the
> > CREATE ... IF NOT EXISTS returned a notice saying "this thing already
> > existed" that it would realize that it shouldn't try to adjust the
> > attributes of that object, as it was already existing.
> 
> ​pg_dump isn't in play during the restore which is where the error occurs.

Ah, but pg_dump could potentially dump out more complicated logic than
it does today.  We currently presume that there is never any need to
reason about the results of a prior command before executing the next in
pg_dump's output.  In some ways, it's rather impressive that we've
gotten this far with that assumption, but ensuring that is the case
means that our users are also able to rely on that and write simple
scripts which can be rerun to reset the database to a specific state.

> During restore you have pg_restore+psql - and having cross-statement logic
> in those applications is likely a non-starter. 

It would certainly be a very large shift from what we generate today. :)

> That is ultimately the
> problem here, 

Re: [HACKERS] tap tests on older branches fail if concurrency is used

2017-05-31 Thread Craig Ringer
On 1 June 2017 at 08:15, Andres Freund  wrote:
> Hi,
>
> when using
> $ cat ~/.proverc
> -j9
>
> some tests fail for me in 9.4 and 9.5.  E.g. src/bin/script's tests
> yields a lot of fun like:
> $ (cd ~/build/postgres/9.5-assert/vpath/src/bin/scripts/ && make check)
> ...
> # LOG:  received immediate shutdown request
> # WARNING:  terminating connection because of crash of another server process
> # DETAIL:  The postmaster has commanded this server process to roll back the 
> current transaction and exit, because another server process exited 
> abnormally and possibly corrupted shared memory.
> # HINT:  In a moment you should be able to reconnect to the database and 
> repeat your command.
> ...
>
> it appears as if various tests are trampling over each other.  If needed
> I can provide detailed logs, but it appears to readily reproduce on
> several machines...

I'll take a look at what's changed and why it's happening and get back to you.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP backpatching policy

2017-05-31 Thread Craig Ringer
On 1 June 2017 at 01:13, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> My main concern is how widely is the buildfarm going to test the new
>> test frameworks.  If we backpatch PostgresNode-based tests to 9.2, are
>> buildfarm animals going to need to be reconfigured to use
>> --enable-tap-tests?
>
> Yes.  The animals that are doing it at all are using code more or less
> like this:
>
> if ($branch eq 'HEAD' or $branch ge 'REL9_4')
> {
> push(@{$conf{config_opts}},"--enable-tap-tests");
> }
>
> (verbatim from longfin's config script)
>
> So maybe that's an argument for not going back before 9.4.  OTOH,
> you may be giving the buildfarm owners too little credit for
> willingness to update their configurations.

Honestly, it didn't occur to me to back-patch past the introduction of
TAP in 9.4. I was more thinking of bringing that up to current
functionality, and trying to maintain that in future so that TAP tests
in extensions, test scripts for bugs, etc could be easily used on all
back branches.

I don't have a particular objection to doing so, but initially I was
really aiming for bringing 9.5 and 9.6 up to pg10 level, since
PostgresNode is already present in 9.5 so it's a much simpler target
for backporting the pg10 stuff. Then maintaining from there going
forward, so by the time pg12 is out, everything has solid and pretty
consistent TAP infrastructure.

I'm not too fussed if everyone decides it's all too hard / not worth
it. I'll just extract src/test/modules into a separate github repo.
For use in extensions I'll teach it how to overwrite the stock
PostgresNode etc in a Pg install tree. For use for in-tree testing
it'd have a Makefile that finds and clobbers the in-tree
PostgresNode.pm etc. So it's a hassle, but not the end of the world.

I just suspect we'll all benefit from making it easier to write tests
that work across more releases, and that updating the test modules in
back branches isn't an unduly invasive thing to do.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] tap tests on older branches fail if concurrency is used

2017-05-31 Thread Andres Freund
Hi,

when using
$ cat ~/.proverc
-j9

some tests fail for me in 9.4 and 9.5.  E.g. src/bin/script's tests
yields a lot of fun like:
$ (cd ~/build/postgres/9.5-assert/vpath/src/bin/scripts/ && make check)
...
# LOG:  received immediate shutdown request
# WARNING:  terminating connection because of crash of another server process
# DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server process exited abnormally 
and possibly corrupted shared memory.
# HINT:  In a moment you should be able to reconnect to the database and repeat 
your command.
...

it appears as if various tests are trampling over each other.  If needed
I can provide detailed logs, but it appears to readily reproduce on
several machines...

See Michael, I'll provide the details and a reproducer ;)

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-31 Thread Andres Freund
On 2017-05-31 14:24:38 -0700, Andres Freund wrote:
> On 2017-05-24 10:52:37 -0400, Robert Haas wrote:
> > On Wed, May 24, 2017 at 10:32 AM, Andres Freund  wrote:
> > > Well, but then we should just remove minval/maxval if we can't rely on
> > > it.
> >
> > That seems like a drastic overreaction to me.
> 
> Well, either they work, or they don't.  But since it turns out to be
> easy enough to fix anyway...
> 
> 
> > > I wonder if that's not actually very little new code, and I think we
> > > might end up regretting having yet another inconsistent set of semantics
> > > in v10, which we'll then end up changing again in v11.
> >
> > I'm not exercised enough about it to spend time on it or to demand
> > that Peter do so, but feel free to propose something.
> 
> This turns out to be fairly simple patch.  We already do precisely what
> I describe when resetting a sequence for TRUNCATE, so it's an already
> tested codepath.  It also follows a lot more established practice around
> transactional schema changes, so I think that's architecturally better
> too.  Peter, to my understanding, agreed with that reasoning at pgcon.
> 
> I just have two questions:
> 1) We've introduced, in 3d092fe540, infrastructure to avoid unnecessary
>catalog updates, in an attemt to fix the "concurrently updated"
>error. But that turned out to not be sufficient anyway, and it bulks
>up the code.  I'd vote for just removing that piece of logic, it
>doesn't buy us anything meaningful, and it's bulky.
> 
> 2) There's currently logic that takes a lower level lock for ALTER
>SEQUENCE RESET without other options.  I think that's a bit confusing
>with the proposed change, because then it's unclear when ALTER
>SEQUENCE is transactional and when not.  I think it's actually a lot
>easier to understand if we keep nextval()/setval() as
>non-transactional, and ALTER SEQUENCE as transactional.
> 
> Comments?

Here's a patch doing what I suggested above.  The second patch addresses
an independent oversight where the post alter hook was invoked before
doing the catalog update.

- Andres
>From 2ebbb5dc7449e5707949686ce2a59b2c5f76783b Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 31 May 2017 16:39:27 -0700
Subject: [PATCH 1/2] Make ALTER SEQUENCE, including RESTART, fully
 transactional.

Previously the changes to the "data" part of the sequence, i.e. the
one containing the current value, were not transactional, whereas the
definition, including minimum and maximum value were.  That leads to
odd behaviour if a schema change is rolled back, with the potential
that out-of-bound sequence values can be returned.

To avoid the issue create a new relfilenode fork whenever ALTER
SEQUENCE is executed, similar to how TRUNCATE ... RESTART IDENTITY
already is already handled.

This also makes ALTER SEQUENCE RESTART transactional, as it seems to
be too confusing to have some forms of ALTER SEQUENCE behave
transactionally, some forms not.  This way setval() and nextval() are
not transactional, but DDL is, which seems to make sense.

This commit also rolls back parts of the changes made in 3d092fe540
and f8dc1985f as they're now not needed anymore.

Author: Andres Freund
Discussion: https://postgr.es/m/20170522154227.nvafbsm62sjpb...@alap3.anarazel.de
---
 doc/src/sgml/ref/alter_sequence.sgml |  15 ++--
 src/backend/commands/sequence.c  | 123 +++
 src/test/isolation/expected/sequence-ddl.out |  92 ++--
 src/test/isolation/specs/sequence-ddl.spec   |  19 +++--
 4 files changed, 96 insertions(+), 153 deletions(-)

diff --git a/doc/src/sgml/ref/alter_sequence.sgml b/doc/src/sgml/ref/alter_sequence.sgml
index 30e5316b8c..3a04d07ecc 100644
--- a/doc/src/sgml/ref/alter_sequence.sgml
+++ b/doc/src/sgml/ref/alter_sequence.sgml
@@ -171,7 +171,7 @@ ALTER SEQUENCE [ IF EXISTS ] name S

 The optional clause RESTART [ WITH restart ] changes the
-current value of the sequence.  This is equivalent to calling the
+current value of the sequence.  This is similar to calling the
 setval function with is_called =
 false: the specified value will be returned by the
 next call of nextval.
@@ -182,11 +182,11 @@ ALTER SEQUENCE [ IF EXISTS ] name S

 

-Like a setval call, a RESTART
-operation on a sequence is never rolled back, to avoid blocking of
-concurrent transactions that obtain numbers from the same sequence.
-(The other clauses cause ordinary catalog updates that can be rolled
-back.)
+In contrast to a setval call,
+a RESTART operation on a sequence is transactional
+and blocks concurrent transactions from obtaining numbers from the
+same sequence. If that's not the desired mode of
+operation, setval should be used.

   
  
@@ -307,8 +307,7 @@ ALTER SEQUENCE [ IF EXISTS ] 

[HACKERS] Typo in xlogfuncs.c [WAS Re: Incorrect mention of pg_xlog_switch() in xlogfuncs.c]

2017-05-31 Thread Neha Khatri
Simplifying  $subject. There are typos in xlogfuncs.c. So Either

s/pg_xlog_switch/pg_switch_wal

Or

Remove "pg_xlog_switch" from the comments.

Attached patches both ways.

Regards,
Neha

On Sat, May 20, 2017 at 1:08 AM, Neha Khatri  wrote:

> While reading some code, noticed that the headers of functions
> pg_walfile_name_offset() and pg_walfile_name() incorrecty refer
> pg_xlog_switch() since the inception of code in commit 704ddaaa.
>
> In PG10 implementation, actual name of the referred function is
> pg_switch_wal(). So either refer the correct name in the function
> header or remove the other function referral from the function header.
>
>


remove_incorrect_function_referral.patch
Description: Binary data


correctly_refer_pg_switch_wal.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-05-31 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, May 30, 2017 at 11:49 PM, Stephen Frost  wrote:
> > ... and I don't believe that we should be asking the
> > implementors of channel binding to also implement support for multiple
> > TLS libraries in PostgreSQL in order to test that their RFC-following
> > (at least, as far as they can tell) implementation actually works.
> 
> You're of course free to believe what you wish, but that sounds
> short-sighted to me.  If we implement channel binding and it turns out
> not to be interoperable with other SSL implementations, then what?  We
> can't change it later without breaking compatibility with our own
> prior implementation.  Note that Álvaro Hernández Tortosa said about
> two hours before you sent this email that it doesn't seem possible to
> implement something comparable in Java's standard SSL stack.  If
> that's the case, adopting this implementation is dooming everyone who
> connects to the database server using JDBC to be unable to use channel
> binding.  And that's a large percentage of our user base.

I'm hopeful that we're closer to agreement here than we are the
opposite.

It was absolutely not my intent that the ongoing discussion between
Alvaro and Michael be stopped or changed, quite the opposite, in fact.

If your comments regarding the requirement that we have interoperability
testing of this feature before accepting it were intended to mean that
we need to make sure that the JDBC driver is able to work with the
chosen solution (or, perhaps, that we support multuple options, one of
which works with the JDBC changes), and that we review the other TLS
libraries with the goal of making sure that what we agree on in core
should work with those also, then that's great and exactly what I'd like
to see also.

If your comments regarding the requirement that we have interoperability
testing of this feature before accepting it were intended to mean that
we must have a complete alternative TLS solution, while that would
actually play a great deal to a goal I've had for a while to have PG
support multiple TLS implementations (LibNSS being top-of-mind, at least
for me, as I've mentioned a couple times), I can't agree that we should
require that before accepting channel binding as a feature.  To do so
would be akin to requiring that we support multiple TLS implementations
before we agreed to support client-side certificates, in my opinion.

I would be rather surprised if the solution which Michael and Alvaro
come to results in a solution which requires us to break compatibility
down the road, though it's of course a risk we need to consider.  The
ongoing discussion around finding a way to support both methods outlined
in the RFC sounds exactly correct to me and I encourage further
discussion there.

> And then what happens when we do add support for Windows SSL or macOS
> SSL?  It sounds like you're equally willing to throw people using
> those implementations under the bus.  So we'll end up with channel
> binding that works only when everybody's running the same operating
> system and nobody's using Java.  Uggh.  At that point you might as
> well just go back to using SSL certificates to solve this problem.  If
> you use SSL certificates, then (1) it should work with any SSL
> implementation and (2) the client can force SSL certificate
> verification, whereas currently the client cannot force even the use
> of SCRAM, let alone channel binding.

I hope it's clear that it's not my intent to throw anyone "under the
bus."  The only reason that "SSL certificates should work with any SSL
implementation" is because those SSL implementations are based on RFCs
which define how they should work and what I was encouraging up-thread
is exactly that we should be looking at RFCs to determine the right path
forward.  There are cases, of course, where the RFCs have alternatives
and the ideal approach is to find a way to work with all of those
alternatives, or at least implement a solution which is flexible, such
that later changes could add support without breaking existing users.

I'm certainly on-board with the general idea of having a way for the
client to require both SCRAM and channel binding and I agree that's a
hole in our current system, but that is really an orthogonal concern.

> So what is being proposed here does not (yet, anyway) provide any
> actual security and seems to have poor prospects for interoperability,
> whereas we have an existing feature (SSL certificates) that has
> neither of those problems.  Are you sure you're not putting
> buzzword-compliance ahead of real progress?

Adding support for channel binding is quite important and valuable, in
its own right.  I concur that we want to provide ways for the client to
require SCRAM and to require channel binding, but I don't see any
particular reason that adding support for channel binding has to be held
up behind that.

As for your question, I have to say that I find 

[HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-05-31 Thread Andres Freund
Hi,

At the moment $subject doesn't allow parallelism, because copy.c's
pg_plan_query() invocation doesn't set the CURSOR_OPT_PARALLEL_OK
flag.

To me that appears to be an oversight rather than intentional.  A
somewhat annoying one at that, because it's not uncommong to use COPY to
execute reports to a CSV file and such.

Robert, am I missing a complication?

I personally think we should fix this in 9.6 and 10, but I've to admit
I'm not entirely impartial, because Citus hit this...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-05-31 Thread Mark Dilger

> On May 31, 2017, at 4:00 PM, Alvaro Herrera  wrote:
> 
> Mark Dilger wrote:
> 

relpartbound
  | 

 relpartbound   
 | ?column? | ?column?
 ++--+--
 {PARTITIONBOUNDSPEC :strategy l :listdatums ({CONST :consttype 23000 
 :consttypmod -1 :constcollid 0 :constlen 2 :constbyval true :constisnull 
 false :location -1 :constvalue 2 [ 0 0 0 0 0 0 0 0 ]}) :lowerdatums <> 
 :upperdatums <> :location 82} | {PARTITIONBOUNDSPEC :strategy l 
 :listdatums ({CONST :consttype 23000 :consttypmod -1 :constcollid 0 
 :constlen 2 :constbyval true :constisnull false :location -1 :constvalue 2 
 [ 0 0 0 0 0 0 0 0 ]}) :lowerdatums <> :upperdatums <> :location 73} | f
 | f  
 (1 row)
> 
>> I concede that mitigates the problem somewhat, though I still think a user 
>> may look
>> at pg_class, see there is a column that appears to show the partition 
>> boundaries,
>> and then decide to check whether two tables have the same partition 
>> boundaries
>> by comparing those fields, without passing them first through pg_get_expr(), 
>> a
>> function they may never have heard of.
> 
> How do you expect that the used would actually interpret the above
> weird-looking value?  There's no way to figure out what operations are
> being done in that ugly blob of text.

I don't know how the average user would use it.  I can only tell you what
I am doing, which may be on the fringe of what users do typically.

I have modified the system to add a number of catalog tables not normally
present in postgres.  A few of those catalog tables are partitioned.  Since
they have to be set up at bootstrap time, I can't use the CREATE TABLE
syntax in some src/backend/catalog/*.sql file to create them, I have to
create them with header files, genbki.pl and friends.  There is no support
for this, so I created my own.  That puts me at risk of getting out of sync
with the public sources implementation of partitioning.  As such, I have
regression tests that create at run time partitioned tables that have all the
same columns and boundaries as my catalog tables, and I compare the
pg_class entries against each other to make sure there are no inconsistencies.
When you guys commit changes that impact partitioning, I notice, and change
my code to match.  But in this case, it seemed to me the change that got
committed was not thought through, and it might benefit the community for
me to point it out, rather than quietly make my code behave the same as
what got committed.

I doubt too many people will follow in my footsteps here, but other people
may hit this :location peculiarity for other reasons.

Once again, this is just to give you context about why I said anything in the
first place.

> I suspect it would be better to reduce the location field to -1 as
> proposed, though, since the location has no actual meaning once the node
> is stored standalone rather than as part of a larger command.  However
> it's clear that we're not consistent about doing this elsewhere.

I have no opinion about whether this should be done for 10.0, given the
release schedule.  Changing it for 10.0 or 11.0 seems reasonable to me.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-05-31 Thread Alvaro Herrera
Mark Dilger wrote:

> >>
> >> relpartbound   
> >>   |
> >>
> >>  relpartbound  
> >>  | ?column? | ?column?
> >> ++--+--
> >> {PARTITIONBOUNDSPEC :strategy l :listdatums ({CONST :consttype 23000 
> >> :consttypmod -1 :constcollid 0 :constlen 2 :constbyval true :constisnull 
> >> false :location -1 :constvalue 2 [ 0 0 0 0 0 0 0 0 ]}) :lowerdatums <> 
> >> :upperdatums <> :location 82} | {PARTITIONBOUNDSPEC :strategy l 
> >> :listdatums ({CONST :consttype 23000 :consttypmod -1 :constcollid 0 
> >> :constlen 2 :constbyval true :constisnull false :location -1 :constvalue 2 
> >> [ 0 0 0 0 0 0 0 0 ]}) :lowerdatums <> :upperdatums <> :location 73} | f
> >> | f  
> >> (1 row)

> I concede that mitigates the problem somewhat, though I still think a user 
> may look
> at pg_class, see there is a column that appears to show the partition 
> boundaries,
> and then decide to check whether two tables have the same partition boundaries
> by comparing those fields, without passing them first through pg_get_expr(), a
> function they may never have heard of.

How do you expect that the used would actually interpret the above
weird-looking value?  There's no way to figure out what operations are
being done in that ugly blob of text.

I suspect it would be better to reduce the location field to -1 as
proposed, though, since the location has no actual meaning once the node
is stored standalone rather than as part of a larger command.  However
it's clear that we're not consistent about doing this elsewhere.

> To me, it seems odd to immortalize a SQL parsing field in the catalog 
> definition of
> the relation, but perhaps that's just my peculiar sensibilities.  If the 
> community is more
> on your side, I'm not going to argue it.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-05-31 Thread Mark Dilger

> On May 31, 2017, at 3:42 PM, Andres Freund  wrote:
> 
> On 2017-05-31 15:38:58 -0700, Mark Dilger wrote:
>>> Normal users aren't going to make sense of node trees in the first
>>> place.  You should use pg_get_expr for it:
>>> postgres[3008][1]=# SELECT pg_get_expr(relpartbound, oid) FROM pg_class 
>>> WHERE relpartbound IS NOT NULL;
>>> ┌──┐
>>> │ pg_get_expr  │
>>> ├──┤
>>> │ FOR VALUES IN (1, 2) │
>>> └──┘
>>> (1 row)
>> 
>> I concede that mitigates the problem somewhat, though I still think a user 
>> may look
>> at pg_class, see there is a column that appears to show the partition 
>> boundaries,
>> and then decide to check whether two tables have the same partition 
>> boundaries
>> by comparing those fields, without passing them first through pg_get_expr(), 
>> a
>> function they may never have heard of.
>> 
>> To me, it seems odd to immortalize a SQL parsing field in the catalog 
>> definition of
>> the relation, but perhaps that's just my peculiar sensibilities.  If the 
>> community is more
>> on your side, I'm not going to argue it.
> 
> Given that various other node trees stored in the catalog also have
> location fields, about which I recall no complaints, I don't think this
> is a significant issue.  Nor something that I think makes sense to solve
> in isolation, without tackling all stored expressions.

Ok.  Just to clarify, I didn't come up with this problem as part of some general
code review.  I merged the recent changes into my tree, and my regression
tests broke.  That's fine; that's what they are there to do.  Break, and in so 
doing,
alert me to the fact that the project has changed things that will require me to
make modifications.  It just seemed strange to me that I should be changing 
stuff
to try to get the :location field to be equal in my code to the :location field 
in the
public sources, since it seems to serve no purpose.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-05-31 Thread Andres Freund
On 2017-05-31 15:38:58 -0700, Mark Dilger wrote:
> > Normal users aren't going to make sense of node trees in the first
> > place.  You should use pg_get_expr for it:
> > postgres[3008][1]=# SELECT pg_get_expr(relpartbound, oid) FROM pg_class 
> > WHERE relpartbound IS NOT NULL;
> > ┌──┐
> > │ pg_get_expr  │
> > ├──┤
> > │ FOR VALUES IN (1, 2) │
> > └──┘
> > (1 row)
> 
> I concede that mitigates the problem somewhat, though I still think a user 
> may look
> at pg_class, see there is a column that appears to show the partition 
> boundaries,
> and then decide to check whether two tables have the same partition boundaries
> by comparing those fields, without passing them first through pg_get_expr(), a
> function they may never have heard of.
> 
> To me, it seems odd to immortalize a SQL parsing field in the catalog 
> definition of
> the relation, but perhaps that's just my peculiar sensibilities.  If the 
> community is more
> on your side, I'm not going to argue it.

Given that various other node trees stored in the catalog also have
location fields, about which I recall no complaints, I don't think this
is a significant issue.  Nor something that I think makes sense to solve
in isolation, without tackling all stored expressions.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-05-31 Thread Mark Dilger

> On May 31, 2017, at 3:17 PM, Andres Freund  wrote:
> 
> On 2017-05-31 15:06:06 -0700, Mark Dilger wrote:
>> That's cold comfort, given that most users will be looking at the pg_class
>> table and not writing C code that compares Node objects.  I wrote a bit of
>> regression test logic that checks, and sure enough the relpartbound field
>> shows up as unequal:
>>  
>>   relpartbound   
>>  
>> SELECT a.relpartbound, b.relpartbound, a.relpartbound = b.relpartbound, 
>> a.relpartbound::text = b.relpartbound::text
>>FROM pg_class a, pg_class b
>>WHERE a.relname = 'acct_partitioned_1'
>>  AND b.relname = 'acct_partitioned_2';
>>  
>>   relpartbound   
>>   |  
>>   
>> relpartbound   | 
>> ?column? | ?column?
>> ++--+--
>> {PARTITIONBOUNDSPEC :strategy l :listdatums ({CONST :consttype 23000 
>> :consttypmod -1 :constcollid 0 :constlen 2 :constbyval true :constisnull 
>> false :location -1 :constvalue 2 [ 0 0 0 0 0 0 0 0 ]}) :lowerdatums <> 
>> :upperdatums <> :location 82} | {PARTITIONBOUNDSPEC :strategy l :listdatums 
>> ({CONST :consttype 23000 :consttypmod -1 :constcollid 0 :constlen 2 
>> :constbyval true :constisnull false :location -1 :constvalue 2 [ 0 0 0 0 0 0 
>> 0 0 ]}) :lowerdatums <> :upperdatums <> :location 73} | f| f  
>> (1 row)
> 
> Normal users aren't going to make sense of node trees in the first
> place.  You should use pg_get_expr for it:
> postgres[3008][1]=# SELECT pg_get_expr(relpartbound, oid) FROM pg_class WHERE 
> relpartbound IS NOT NULL;
> ┌──┐
> │ pg_get_expr  │
> ├──┤
> │ FOR VALUES IN (1, 2) │
> └──┘
> (1 row)

I concede that mitigates the problem somewhat, though I still think a user may 
look
at pg_class, see there is a column that appears to show the partition 
boundaries,
and then decide to check whether two tables have the same partition boundaries
by comparing those fields, without passing them first through pg_get_expr(), a
function they may never have heard of.

To me, it seems odd to immortalize a SQL parsing field in the catalog 
definition of
the relation, but perhaps that's just my peculiar sensibilities.  If the 
community is more
on your side, I'm not going to argue it.

Mark Dilger



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-05-31 Thread Andres Freund
On 2017-05-31 15:06:06 -0700, Mark Dilger wrote:
> That's cold comfort, given that most users will be looking at the pg_class
> table and not writing C code that compares Node objects.  I wrote a bit of
> regression test logic that checks, and sure enough the relpartbound field
> shows up as unequal:
>   
>   relpartbound
> 
> SELECT a.relpartbound, b.relpartbound, a.relpartbound = b.relpartbound, 
> a.relpartbound::text = b.relpartbound::text
> FROM pg_class a, pg_class b
> WHERE a.relname = 'acct_partitioned_1'
>   AND b.relname = 'acct_partitioned_2';
>   
>   relpartbound
>   
>   |   
>  relpartbound 
>   
>  | ?column? | ?column?
> ++--+--
>  {PARTITIONBOUNDSPEC :strategy l :listdatums ({CONST :consttype 23000 
> :consttypmod -1 :constcollid 0 :constlen 2 :constbyval true :constisnull 
> false :location -1 :constvalue 2 [ 0 0 0 0 0 0 0 0 ]}) :lowerdatums <> 
> :upperdatums <> :location 82} | {PARTITIONBOUNDSPEC :strategy l :listdatums 
> ({CONST :consttype 23000 :consttypmod -1 :constcollid 0 :constlen 2 
> :constbyval true :constisnull false :location -1 :constvalue 2 [ 0 0 0 0 0 0 
> 0 0 ]}) :lowerdatums <> :upperdatums <> :location 73} | f| f  
> (1 row)

Normal users aren't going to make sense of node trees in the first
place.  You should use pg_get_expr for it:
postgres[3008][1]=# SELECT pg_get_expr(relpartbound, oid) FROM pg_class WHERE 
relpartbound IS NOT NULL;
┌──┐
│ pg_get_expr  │
├──┤
│ FOR VALUES IN (1, 2) │
└──┘
(1 row)

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-05-31 Thread Mark Dilger

> On May 31, 2017, at 2:49 PM, Alvaro Herrera  wrote:
> 
> Mark Dilger wrote:
>> Hackers,
>> 
>> recent changes have introduced the :location field to the partboundspec
>> in pg_catalog.pg_class.  This means that if two identical tables with
>> identical partitioning scheme are created, but one is done before a change
>> to gram.y, and the other after a change to gram.y, the relpartbound fields
>> for those two tables could show up as different.
> 
> Actually, if you look at equalfuncs.c, you'll note that locations are
> not really compared:
> 
> /* Compare a parse location field (this is a no-op, per note above) */
> #define COMPARE_LOCATION_FIELD(fldname) \
>   ((void) 0)
> 
> where the referenced note is:
> 
> * NOTE: it is intentional that parse location fields (in nodes that have
> * one) are not compared.  This is because we want, for example, a variable
> * "x" to be considered equal() to another reference to "x" in the query.

That's cold comfort, given that most users will be looking at the pg_class
table and not writing C code that compares Node objects.  I wrote a bit of
regression test logic that checks, and sure enough the relpartbound field
shows up as unequal:

relpartbound

SELECT a.relpartbound, b.relpartbound, a.relpartbound = b.relpartbound, 
a.relpartbound::text = b.relpartbound::text
FROM pg_class a, pg_class b
WHERE a.relname = 'acct_partitioned_1'
  AND b.relname = 'acct_partitioned_2';

relpartbound

|   
 relpartbound   

 | ?column? | ?column?
++--+--
 {PARTITIONBOUNDSPEC :strategy l :listdatums ({CONST :consttype 23000 
:consttypmod -1 :constcollid 0 :constlen 2 :constbyval true :constisnull false 
:location -1 :constvalue 2 [ 0 0 0 0 0 0 0 0 ]}) :lowerdatums <> :upperdatums 
<> :location 82} | {PARTITIONBOUNDSPEC :strategy l :listdatums ({CONST 
:consttype 23000 :consttypmod -1 :constcollid 0 :constlen 2 :constbyval true 
:constisnull false :location -1 :constvalue 2 [ 0 0 0 0 0 0 0 0 ]}) 
:lowerdatums <> :upperdatums <> :location 73} | f| f  
(1 row)





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: retry shm attach for windows (WAS: Re: [HACKERS] OK, so culicidae is *still* broken)

2017-05-31 Thread Robert Haas
On Fri, May 26, 2017 at 10:51 AM, Magnus Hagander  wrote:
> I would definitely suggest putting it in HEAD (and thus, v10) for a while to
> get some real world exposure before backpatching. But if it does work out
> well in the end, then we can certainly consider backpatching it. But given
> the difficulty in reliably reproducing the problem etc, I think it's a good
> idea to give it some proper real world experience in 10 first.

So, are you going to, perhaps, commit this?  Or who is picking this up?

/me knows precious little about Windows.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-05-31 Thread Robert Haas
On Mon, May 29, 2017 at 5:26 AM, Amit Kapila  wrote:
>> But I think, we can also take step-by-step approach even for v11. If
>> we agree that it is ok to silently do the updates as long as we
>> document the behaviour, we can go ahead and do this, and then as a
>> second step, implement error handling as a separate patch. If that
>> patch does not materialize, we at least have the current behaviour
>> documented.
>
> I think that is sensible approach if we find the second step involves
> big or complicated changes.

I think it is definitely a good idea to separate the two patches.
UPDATE tuple routing without any special handling for the EPQ issue is
just a partitioning feature.  The proposed handling for the EPQ issue
is an *on-disk format change*.  That turns a patch which is subject
only to routine bugs into one which can eat your data permanently --
so having the "can eat your data permanently" separated out for both
review and commit seems only prudent.  For me, it's not a matter of
which patch is big or complicated, but rather a matter of one of them
being a whole lot riskier than the other.  Even UPDATE tuple routing
could mess things up pretty seriously if we end up with tuples in the
wrong partition, of course, but the other thing is still worse.

In terms of a development plan, I think we would need to have both
patches before either could be committed.  I believe that everyone
other than me who has expressed an opinion on this issue has said that
it's unacceptable to just ignore the issue, so it doesn't sound like
there will be much appetite for having #1 go into the tree without #2.
I'm still really concerned about that approach because we do not have
very much bit space left and WARM wants to use quite a bit of it.  I
think it's quite possible that we'll be sad in the future if we find
that we can't implement feature XYZ because of the bit-space consumed
by this feature.  However, I don't have the only vote here and I'm not
going to try to shove this into the tree over multiple objections
(unless there are a lot more votes the other way, but so far there's
no sign of that).

Greg/Amit's idea of using the CTID field rather than an infomask bit
seems like a possibly promising approach.  Not everything that needs
bit-space can use the CTID field, so using it is a little less likely
to conflict with something else we want to do in the future than using
a precious infomask bit.  However, I'm worried about this:

/* Make sure there is no forward chain link in t_ctid */
tp.t_data->t_ctid = tp.t_self;

The comment does not say *why* we need to make sure that there is no
forward chain link, but it implies that some code somewhere in the
system does or at one time did depend on no forward link existing.
Any such code that still exists will need to be updated.  Anybody know
what code that might be, exactly?

The other potential issue I see here is that I know the WARM code also
tries to use the bit-space in the CTID field; in particular, it uses
the CTID field of the last tuple in a HOT chain to point back to the
root of the chain.  That seems like it could conflict with the usage
proposed here, but I'm not totally sure.  Has anyone investigated this
issue?

Regarding the trigger issue, I can't claim to have a terribly strong
opinion on this.  I think that practically anything we do here might
upset somebody, but probably any halfway-reasonable thing we choose to
do will be OK for most people.  However, there seems to be a
discrepancy between the approach that got the most votes and the one
that is implemented by the v8 patch, so that seems like something to
fix.

For what it's worth, in the future, I imagine that we might allow
adding a trigger to a partitioned table and having that cascade down
to all descendant tables.  In that world, firing the BR UPDATE trigger
for the old partition and the AR UPDATE trigger for the new partition
will look a lot like the behavior the user would expect on an
unpartitioned table, which could be viewed as a good thing.  On the
other hand, it's still going to be a DELETE+INSERT under the hood for
the foreseeable future, so firing the delete triggers and then the
insert triggers is also defensible.  Is there any big difference
between these appraoches in terms of how much code is required to make
this work?

In terms of the approach taken by the patch itself, it seems
surprising to me that the patch only calls
ExecSetupPartitionTupleRouting when an update fails the partition
constraint.  Note that in the insert case, we call that function at
the start of execution; calling it in the middle seems to involve
additional hazards; for example, is it really safe to add additional
ResultRelInfos midway through the operation?  Is it safe to take more
locks midway through the operation? It seems like it might be a lot
safer to decide at the beginning of the operation whether this is
needed -- we can skip it if none of the columns involved in the

Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-05-31 Thread Alvaro Herrera
Mark Dilger wrote:
> Hackers,
> 
> recent changes have introduced the :location field to the partboundspec
> in pg_catalog.pg_class.  This means that if two identical tables with
> identical partitioning scheme are created, but one is done before a change
> to gram.y, and the other after a change to gram.y, the relpartbound fields
> for those two tables could show up as different.

Actually, if you look at equalfuncs.c, you'll note that locations are
not really compared:

/* Compare a parse location field (this is a no-op, per note above) */
#define COMPARE_LOCATION_FIELD(fldname) \
((void) 0)

where the referenced note is:

 * NOTE: it is intentional that parse location fields (in nodes that have
 * one) are not compared.  This is because we want, for example, a variable
 * "x" to be considered equal() to another reference to "x" in the query.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-31 Thread Andres Freund
On 2017-05-24 10:52:37 -0400, Robert Haas wrote:
> On Wed, May 24, 2017 at 10:32 AM, Andres Freund  wrote:
> > Well, but then we should just remove minval/maxval if we can't rely on
> > it.
>
> That seems like a drastic overreaction to me.

Well, either they work, or they don't.  But since it turns out to be
easy enough to fix anyway...


> > I wonder if that's not actually very little new code, and I think we
> > might end up regretting having yet another inconsistent set of semantics
> > in v10, which we'll then end up changing again in v11.
>
> I'm not exercised enough about it to spend time on it or to demand
> that Peter do so, but feel free to propose something.

This turns out to be fairly simple patch.  We already do precisely what
I describe when resetting a sequence for TRUNCATE, so it's an already
tested codepath.  It also follows a lot more established practice around
transactional schema changes, so I think that's architecturally better
too.  Peter, to my understanding, agreed with that reasoning at pgcon.

I just have two questions:
1) We've introduced, in 3d092fe540, infrastructure to avoid unnecessary
   catalog updates, in an attemt to fix the "concurrently updated"
   error. But that turned out to not be sufficient anyway, and it bulks
   up the code.  I'd vote for just removing that piece of logic, it
   doesn't buy us anything meaningful, and it's bulky.

2) There's currently logic that takes a lower level lock for ALTER
   SEQUENCE RESET without other options.  I think that's a bit confusing
   with the proposed change, because then it's unclear when ALTER
   SEQUENCE is transactional and when not.  I think it's actually a lot
   easier to understand if we keep nextval()/setval() as
   non-transactional, and ALTER SEQUENCE as transactional.

Comments?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017 : Proposal for predicate locking in gist index

2017-05-31 Thread Kevin Grittner
On Wed, May 31, 2017 at 3:02 PM, Shubham Barai  wrote:

> I have been accepted as GSoC student for the project "Explicitly support
> predicate locks in index access methods besides b-tree". I want to share my
> approach for implementation of page level predicate locking in gist index.

For the benefit of all following the discussion, implementing
support in an index AM conceptually consists of two things:
(1) Any scan with user-visible results must create SIRead predicate
locks on "gaps" scanned.  (For example, a scan just to find an
insertion spot for an index entry does not generally count, whereas
a scan to satisfy a user "EXISTS" test does.)
(2) Any insert into the index must CheckForSerializableConflictIn()
at enough points that at least one of them will detect an overlap
with a predicate lock from a preceding scan if the inserted index
entry might have changed the results of that preceding scan.

Detecting such a conflict does not automatically result in a
serialization failure, but is only part of tracking the information
necessary to make such a determination.  All that is handled by the
existing machinery if the AM does the above.

With a btree index, locking gaps at the leaf level is normally
sufficient, because both scan and insertion of an index entry must
descend that far.

> The main difference between b-tree and gist index while searching for a
> target tuple is that in gist index, we can determine if there is a match or
> not at any level of the index.

Agreed.  A gist scan can fail at any level, but for that scan to
have produced a different result due to insertion of an index entry,
*some* page that the scan looked at must be modified.

> The simplest way to do that will be by inserting a call for
> prdicatelockpage() in gistscanpage().

Yup.

> Insertion algorithm also needs to check for conflicting predicate locks at
> each level of the index.

Yup.

> We can insert a call for CheckForSerializableConflictIn() at two places in
> gistdoinsert().
>
> 1. after acquiring an exclusive lock on internal page (in case we are trying
> to replace an old search key)
>
> 2. after acquiring an exclusive lock on leaf page
>
> If there is not enough space for insertion, we have to copy predicate lock
> from an old page to all new pages generated after a successful split
> operation. For that, we can insert a call for PredicateLockPageSplit() in
> gistplacetopage().

That all sounds good.  When you code a patch, don't forget to add
tests to src/test/isolation.

Do you need any help getting a development/build/test environment
set up?

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP backpatching policy

2017-05-31 Thread Joshua D. Drake

On 05/31/2017 07:49 AM, Robert Haas wrote:

On Tue, May 30, 2017 at 9:12 PM, Craig Ringer  wrote:

Thoughts? Backpatch new TAP methods, etc, into back branches routinely?


[...]


 When customers start to doubt that, then they
become reluctant to apply new minor versions in their entirety and ask
for individual commits to be cherry-picked, or just don't upgrade at
all. 


This may be true, on the other hand that isn't .Org's problem. Customers 
are CMD, EDB, 2Qs problem. .Org's problem is: How do we ensure the best 
possible result for PostgreSQL.


I think comprehensive testing (which I am sure you agree) is bullet 
point on that list.



One could argue that commits to the testing framework shouldn't
make customers nervous, but what people should be worried about and
what they are actually worried about do not always match.  It is
useful to be able to show a customer a list of things that were done
in a minor release and see nothing in there that can be described as
optional tinkering.


This is about narrative. You don't say "optional tinkering". You say, 
"Initiate code modification for comprehensive TAP (testing) framework".


That makes customers knees weak and they swoon.


back-patching (to avoid churning a supposedly-stable branch).  I'm not
sure exactly what I think about this issue, but I'm very skeptical of
the idea that back-patching less conservatively will have no
downsides.


There is never not a downside. The question is, "Does the upside 
outweigh the downside?". In this case I would argue it is fairly obvious.


Thanks,

JD



--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GSoC 2017 : Proposal for predicate locking in gist index

2017-05-31 Thread Shubham Barai
Hello everyone,



I have been accepted as GSoC student for the project "Explicitly support
predicate locks in index access methods besides b-tree". I want to share my
approach for implementation of page level predicate locking in gist index.
Any suggestions will be appreciated.



Proposal



The main difference between b-tree and gist index while searching for a
target tuple is that in gist index, we can determine if there is a match or
not at any level of the index. In gist, index entry of internal nodes
contains a predicate which is used as a search key to search all reachable
tuples from that node. To insert a tuple in the index, we first check the
key representing a target subtree. If it doesn't already cover the key we
are inserting, we have to replace it with the union of old key and the key
we are inserting. After considering all these points, it seems logical to
acquire a predicate lock at each level of the index.

The simplest way to do that will be by inserting a call for
prdicatelockpage() in gistscanpage().

Insertion algorithm also needs to check for conflicting predicate locks at
each level of the index.

We can insert a call for CheckForSerializableConflictIn() at two places in
gistdoinsert().

1. after acquiring an exclusive lock on internal page (in case we are
trying to replace an old search key)

2. after acquiring an exclusive lock on leaf page

If there is not enough space for insertion, we have to copy predicate lock
from an old page to all new pages generated after a successful split
operation. For that, we can insert a call for PredicateLockPageSplit() in
gistplacetopage().



Regards,

Shubham


Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-05-31 Thread Michael Paquier
On Tue, May 30, 2017 at 6:50 PM, Álvaro Hernández Tortosa
 wrote:
> Actually this Python patch seems to ultimately rely on the same OpenSSL
> functions as your implementation.

Yup. My point is that even if those APIs are undocumented, I think
that they are not going to disappear either.

> I don't have anything against implementing tls-unique, specially as per
> the RFC it is mandatory (if channel binding support is provided). However,
> given the use of undocumented API, given that at least the Java driver would
> have a very difficult time implementing it (basically replacing Java's
> standard SSL stack with a completely external one, BouncyCastle, which adds
> load, size and complexity), I would rather focus the efforts on
> tls-server-end-point which only needs to access the certificate data,
> something that most APIs/frameworks/languages seem to cover much more
> naturally than tls-unique.

Point taken. Well, this brings us to the point where we should have
both tls-unique and end-point to allow JDBC to work with it. Now,
thinking about it, do we really need to advertise the available
channel binding types when the client chooses the -PLUS mechanism?
Wouldn't it make sense to let the client choose what it thinks is
better and just fail the exchange with the backend if the channel type
is not supported?

> Both are equally safe and effective and so having support for both seems
> sensible to me.

I have some automatic setups that would be really simplified by just
using libpq with SSL connections if there is channel binding. And that
involves certificate deployments, I think I am not the only one to see
that present even if JDBC just supports end-point.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-05-31 Thread Robert Haas
On Wed, May 31, 2017 at 3:40 PM, Mark Dilger  wrote:
> recent changes have introduced the :location field to the partboundspec
> in pg_catalog.pg_class.  This means that if two identical tables with
> identical partitioning scheme are created, but one is done before a change
> to gram.y, and the other after a change to gram.y, the relpartbound fields
> for those two tables could show up as different.
>
> Can we perhaps remove the :location field here?  If not, could somebody
> please defend why this belongs in the catalog entries for the table?  Sorry
> if I am missing something obvious...

Yeah, that probably wasn't a good decision, although calling a
decision might be giving it too much credit.  I think the easiest
thing to do here would be to change transformPartitionBound() to set
result_spec->location to -1, although maybe a better idea would be to
have two different structures -- one that represents the partition
bound specification *before* parse analysis and another that
represents it *after* parse analysis, rather than reusing the same
structure for both.  Then again, maybe making two different node types
for this is overkill.  Not sure.  Opinions?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_class.relpartbound definition overly brittle

2017-05-31 Thread Mark Dilger
Hackers,

recent changes have introduced the :location field to the partboundspec
in pg_catalog.pg_class.  This means that if two identical tables with
identical partitioning scheme are created, but one is done before a change
to gram.y, and the other after a change to gram.y, the relpartbound fields
for those two tables could show up as different.

Can we perhaps remove the :location field here?  If not, could somebody
please defend why this belongs in the catalog entries for the table?  Sorry
if I am missing something obvious...

Thanks,

Mark Dilger


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GENERAL] pg_basebackup error: replication slot "pg_basebackup_2194" already exists

2017-05-31 Thread Magnus Hagander
On Wed, May 31, 2017 at 8:18 PM, Andres Freund  wrote:

> On 2017-05-31 18:22:18 +0200, Magnus Hagander wrote:
> > However, the client can't access the pid of the server as it is now,
> > and its the client that has to create the name.
>
> I don't think that's actually true?  Doesn't the wire protocol always
> include the PID, which is then exposed with PQbackendPID()?
>

Oh, you're right. Well, that makes the fix a lot simpler. Will fix in a
minute.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] Re: [GENERAL] pg_basebackup error: replication slot "pg_basebackup_2194" already exists

2017-05-31 Thread Michael Paquier
On Wed, May 31, 2017 at 11:18 AM, Andres Freund  wrote:
> On 2017-05-31 18:22:18 +0200, Magnus Hagander wrote:
>> However, the client can't access the pid of the server as it is now,
>> and its the client that has to create the name.
>
> I don't think that's actually true?  Doesn't the wire protocol always
> include the PID, which is then exposed with PQbackendPID()?

Ah, you are right here. I forgot that this is exposed to the client at
backend startup.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP backpatching policy

2017-05-31 Thread Stephen Frost
Tom, Alvaro,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Alvaro Herrera  writes:
> > My main concern is how widely is the buildfarm going to test the new
> > test frameworks.  If we backpatch PostgresNode-based tests to 9.2, are
> > buildfarm animals going to need to be reconfigured to use
> > --enable-tap-tests?
> 
> Yes.  The animals that are doing it at all are using code more or less
> like this:
> 
> if ($branch eq 'HEAD' or $branch ge 'REL9_4')
> {
> push(@{$conf{config_opts}},"--enable-tap-tests");
> }
> 
> (verbatim from longfin's config script)
> 
> So maybe that's an argument for not going back before 9.4.  OTOH,
> you may be giving the buildfarm owners too little credit for
> willingness to update their configurations.

I'm certainly on the optomistic side of the equation here when it comes
to buildfarm owners.  Generally speaking, I've seen them be pretty
reasonably responsive when asked to make a change or update something,
and a lot of them are also regular PG contributors, but even those who
aren't seem to take the buildfarm seriously and I expect an email going
out to them would certainly have a majority positive response.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] TAP backpatching policy

2017-05-31 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On the other hand, we tell our users that we only back-patch security
> and stability fixes.  

Perhaps I missed where this changed, but my recollection is that we also
back-patch bug-fixes, and I don't think we should change that.

> When customers start to doubt that, then they
> become reluctant to apply new minor versions in their entirety and ask
> for individual commits to be cherry-picked, or just don't upgrade at
> all.  One could argue that commits to the testing framework shouldn't
> make customers nervous, but what people should be worried about and
> what they are actually worried about do not always match.  It is
> useful to be able to show a customer a list of things that were done
> in a minor release and see nothing in there that can be described as
> optional tinkering.

If the perception issue is that customers aren't comfortable with
changes being made in the main repo which are due to adding testing then
we should consider having an independent test repo which is also able to
be run through the buildfarm.  Perhaps we would be able to consider
having a more relaxed policy around that repository also, along with a
way to distinguish failures on the main repo vs. failures from this
other repo (where the issue might be the test itself rather than an
issue in the code, or it might be an issue in the code that the test
exposed but isn't yet fixed, but allows people to see that their
independent changes on the main repo aren't what caused the buildfarm to
turn red).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] POC: Sharing record typmods between backends

2017-05-31 Thread Andres Freund


On May 31, 2017 11:28:18 AM PDT, Robert Haas  wrote:
>On Wed, May 31, 2017 at 1:46 PM, Andres Freund 
>wrote:
>> On 2017-05-31 13:27:28 -0400, Dilip Kumar wrote:
>>> On Wed, May 31, 2017 at 12:53 PM, Robert Haas
> wrote:
>>> > Well, SH_TYPE's members SH_ELEMENT_TYPE *data and void
>*private_data
>>> > are not going to work in DSM, because they are pointers.  You can
>>> > doubtless come up with a way around that problem, but I guess the
>>> > question is whether that's actually any better than just using
>DHT.
>>>
>>> Probably I misunderstood the question. I assumed that we need to
>bring
>>> in DHT only for achieving this goal. But, if the question is simply
>>> the comparison of DHT vs simplehash for this particular case then I
>>> agree that DHT is a more appropriate choice.
>>
>> Yea, I don't think simplehash is the best choice here.  It's
>worthwhile
>> to use it for performance critical bits, but using it for everything
>> would just increase code size without much benefit.  I'd tentatively
>> assume that anonymous record type aren't going to be super common,
>and
>> that this is going to be the biggest bottleneck if you use them.
>
>Did you mean "not going to be"?

Err, yes.  Thanks
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-05-31 Thread Robert Haas
On Wed, May 31, 2017 at 1:46 PM, Andres Freund  wrote:
> On 2017-05-31 13:27:28 -0400, Dilip Kumar wrote:
>> On Wed, May 31, 2017 at 12:53 PM, Robert Haas  wrote:
>> > Well, SH_TYPE's members SH_ELEMENT_TYPE *data and void *private_data
>> > are not going to work in DSM, because they are pointers.  You can
>> > doubtless come up with a way around that problem, but I guess the
>> > question is whether that's actually any better than just using DHT.
>>
>> Probably I misunderstood the question. I assumed that we need to bring
>> in DHT only for achieving this goal. But, if the question is simply
>> the comparison of DHT vs simplehash for this particular case then I
>> agree that DHT is a more appropriate choice.
>
> Yea, I don't think simplehash is the best choice here.  It's worthwhile
> to use it for performance critical bits, but using it for everything
> would just increase code size without much benefit.  I'd tentatively
> assume that anonymous record type aren't going to be super common, and
> that this is going to be the biggest bottleneck if you use them.

Did you mean "not going to be"?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An incomplete comment sentence in subtrans.c

2017-05-31 Thread Robert Haas
On Tue, May 30, 2017 at 7:43 PM, Masahiko Sawada  wrote:
> There is an incomplete sentence at top of subtrans.c file. I think the
> commit 88e66d19 removed the whole line mistakenly.

Thanks for the patch.  Sorry for the mistake that made it necessary.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error while creating subscription when server is running in single user mode

2017-05-31 Thread Michael Paquier
On Wed, May 31, 2017 at 7:01 AM, Dilip Kumar  wrote:
> On Wed, May 31, 2017 at 7:54 AM, tushar  wrote:
>> centos@centos-cpula bin]$ ./postgres --single postgres -D m1data
>> PostgreSQL stand-alone backend 10beta1
>> backend> create subscription sub connection 'dbname=postgres port=5433
>> user=centos' publication p with (create_slot=0,enabled=off);
>> 2017-05-31 12:53:09.318 BST [10469] LOG:  statement: create subscription sub
>> connection 'dbname=postgres port=5433 user=centos' publication p with
>> (create_slot=0,enabled=off);
>>
>> 2017-05-31 12:53:09.326 BST [10469] ERROR:  epoll_ctl() failed: Bad file
>> descriptor
>
> IMHO, In single user mode, it can not support replication (it can not
> have background WALReciver task). However, I believe there should be a
> proper error if the above statement is correct.

Yeah, see 0e0f43d6 for example. A simple fix is to look at
IsUnderPostmaster when creating, altering or dropping a subscription
in subscriptioncmds.c.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GENERAL] pg_basebackup error: replication slot "pg_basebackup_2194" already exists

2017-05-31 Thread Andres Freund
On 2017-05-31 18:22:18 +0200, Magnus Hagander wrote:
> However, the client can't access the pid of the server as it is now,
> and its the client that has to create the name.

I don't think that's actually true?  Doesn't the wire protocol always
include the PID, which is then exposed with PQbackendPID()?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GENERAL] pg_basebackup error: replication slot "pg_basebackup_2194" already exists

2017-05-31 Thread Michael Paquier
On Wed, May 31, 2017 at 9:22 AM, Magnus Hagander  wrote:
> Moving this one over to -hackers to discuss the fix, as this is clearly an
> issue.
>
> Right now, pg_basebackup will use the pid of the *client* process to
> generate it's ephemeral slot name. Per this report that seems like it can
> definitely be a problem.
>
> One of my first thoughts would be to instead use the pid of the *server* to
> do that, as this will be guaranteed to be unique. However, the client can't
> access the pid of the server as it is now, and its the client that has to
> create the name.

Yes, something like that sounds like a sensible idea. The system
identifier won't help either.

> One way to do that would be to include the pid of the walsender backend in
> the reply to IDENTIFY_SYSTEM, and then use that. What do people think of
> that idea?
>
> Other suggestions?

Here is a funky idea: add a read-only GUC parameter that reports the
PID of the process, and use the SHOW command with the replication
protocol to get the PID on backend-side.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] quiet conversion warning in DatumGetFloat4

2017-05-31 Thread Chapman Flack
On 05/31/2017 11:36 AM, Tom Lane wrote:

> However, I grant your point that some extensions may have outside
> constraints that mandate using -Wconversion, so to the extent that
> we can keep key headers like postgres.h from triggering those warnings,
> it's probably worth doing.  I suspect you're still seeing a lot of them
> though --- experiments with some contrib modules suggest that a lot of
> our other headers also contain code that would trigger them.  I do not
> think I'd be on board with trying to silence them generally.

That was actually the only one PL/Java gets, outside of /sign/
conversions, a special subset of conversion warnings that can be
separately turned off with -Wno-sign-conversion. There are gobs
of those, which I looked into briefly last year, realized they'd
only be fixed by a bunch of cluttery unenlightening casts in pg
headers, which I did not propose, having a suspicion that's how
you would feel about them. I added a Maven build profile that
adds the -Wno-sign-conversion (though actually getting Maven to
select that profile automatically when the compiler is gcc is
one of those things you'd expect to be easy and then be surprised.)

But as long as there used to be no extraneous noise outside of
/sign/ conversions, I figured this one new appearance of a warning
outside that category would be worth silencing, before some day
comes when there are three and it seems like an organization,
or fifty and it seems like a movement.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] relocation truncated to fit: citus build failure on s390x

2017-05-31 Thread Andres Freund
On 2017-05-31 11:57:16 -0400, Alvaro Herrera wrote:
> My vote would be to backpatch it all the way.

+1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-05-31 Thread Andres Freund
On 2017-05-31 13:27:28 -0400, Dilip Kumar wrote:
> On Wed, May 31, 2017 at 12:53 PM, Robert Haas  wrote:
> > Well, SH_TYPE's members SH_ELEMENT_TYPE *data and void *private_data
> > are not going to work in DSM, because they are pointers.  You can
> > doubtless come up with a way around that problem, but I guess the
> > question is whether that's actually any better than just using DHT.
> 
> Probably I misunderstood the question. I assumed that we need to bring
> in DHT only for achieving this goal. But, if the question is simply
> the comparison of DHT vs simplehash for this particular case then I
> agree that DHT is a more appropriate choice.

Yea, I don't think simplehash is the best choice here.  It's worthwhile
to use it for performance critical bits, but using it for everything
would just increase code size without much benefit.  I'd tentatively
assume that anonymous record type aren't going to be super common, and
that this is going to be the biggest bottleneck if you use them.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-05-31 Thread Dilip Kumar
On Wed, May 31, 2017 at 12:53 PM, Robert Haas  wrote:
> Well, SH_TYPE's members SH_ELEMENT_TYPE *data and void *private_data
> are not going to work in DSM, because they are pointers.  You can
> doubtless come up with a way around that problem, but I guess the
> question is whether that's actually any better than just using DHT.

Probably I misunderstood the question. I assumed that we need to bring
in DHT only for achieving this goal. But, if the question is simply
the comparison of DHT vs simplehash for this particular case then I
agree that DHT is a more appropriate choice.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] <> join selectivity estimate question

2017-05-31 Thread Dilip Kumar
On Fri, Mar 17, 2017 at 6:49 PM, Thomas Munro
 wrote:
> Right.  If I temporarily hack neqjoinsel() thus:
>
> result = 1.0 - result;
> +
> +   if (jointype == JOIN_SEMI)
> +   result = 1.0;
> +
> PG_RETURN_FLOAT8(result);
>  }

I was looking into this problem. IMHO, the correct solution will be
that for JOIN_SEMI, neqjoinsel should not estimate the equijoin
selectivity using eqjoinsel_semi, instead, it should calculate the
equijoin selectivity as inner join and it should get the selectivity
of <> by (1-equijoin selectivity). Because for the inner_join we can
claim that "selectivity of '=' + selectivity of '<>' = 1", but same is
not true for the semi-join selectivity. For semi-join it is possible
that selectivity of '=' and '<>' is both are 1.

something like below


@@ -2659,7 +2659,13 @@ neqjoinsel(PG_FUNCTION_ARGS)
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
Oid eqop;
float8  result;

+   if (jointype = JOIN_SEMI)
+   {
+   sjinfo->jointype = JOIN_INNER;
+   }
/*
 * We want 1 - eqjoinsel() where the equality operator is the one
 * associated with this != operator, that is, its negator.

We may need something similar for anti-join as well.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP backpatching policy

2017-05-31 Thread Tom Lane
Alvaro Herrera  writes:
> My main concern is how widely is the buildfarm going to test the new
> test frameworks.  If we backpatch PostgresNode-based tests to 9.2, are
> buildfarm animals going to need to be reconfigured to use
> --enable-tap-tests?

Yes.  The animals that are doing it at all are using code more or less
like this:

if ($branch eq 'HEAD' or $branch ge 'REL9_4')
{
push(@{$conf{config_opts}},"--enable-tap-tests");
}

(verbatim from longfin's config script)

So maybe that's an argument for not going back before 9.4.  OTOH,
you may be giving the buildfarm owners too little credit for
willingness to update their configurations.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] relocation truncated to fit: citus build failure on s390x

2017-05-31 Thread Tom Lane
Alvaro Herrera  writes:
> My vote would be to backpatch it all the way.

That's my thought too.  Otherwise it'll be five years before extension
authors can stop worrying about this issue.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-05-31 Thread Robert Haas
On Wed, May 31, 2017 at 11:16 AM, Dilip Kumar  wrote:
> I agree with you. But, if I understand the use case correctly we need
> to store the TupleDesc for the RECORD in shared hash so that it can be
> shared across multiple processes.  I think this can be achieved with
> the simplehash as well.
>
> For getting this done, we need some fixed shared memory for holding
> static members of SH_TYPE and the process which creates the simplehash
> will be responsible for copying these static members to the shared
> location so that other processes can access the SH_TYPE.  And, the
> dynamic part (the actual hash entries) can be allocated using DSA by
> registering SH_ALLOCATE function.

Well, SH_TYPE's members SH_ELEMENT_TYPE *data and void *private_data
are not going to work in DSM, because they are pointers.  You can
doubtless come up with a way around that problem, but I guess the
question is whether that's actually any better than just using DHT.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-05-31 Thread Robert Haas
On Tue, May 30, 2017 at 8:15 AM, Mithun Cy  wrote:
> On Tue, May 30, 2017 at 10:16 AM, Mithun Cy  
> wrote:
>> Thanks Robert,
>
> Sorry, there was one more mistake ( a typo) in dump_now() instead of
> using pfree I used free corrected same in the new patch v10.

+ *contrib/autoprewarm.c

Wrong.

+Oiddatabase;/* database */
+Oidspcnode;/* tablespace */
+Oidfilenode;/* relation's filenode. */
+ForkNumberforknum;/* fork number */
+BlockNumber blocknum;/* block number */

Why spcnode rather than tablespace?  Also, the third line has a
period, but not the others.  I think you could just drop these
comments; they basically just duplicate the structure names, except
for spcnode which doesn't but probably should.

+boolcan_do_prewarm; /* if set can't do prewarm task. */

The comment and the name of the variable imply opposite meanings.

+/*
+ * detach_blkinfos - on_shm_exit detach the dsm allocated for blockinfos.
+ */
+static void
+detach_blkinfos(int code, Datum arg)
+{
+if (seg != NULL)
+dsm_detach(seg);
+}

I assume you don't really need this.  Presumably process exit is going
to detach the DSM anyway.

+if (seg == NULL)
+ereport(ERROR,
+(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("unable to map dynamic shared memory segment")));

Let's use the wording from the message in parallel.c rather than this
wording.  Actually, we should probably (as a separate patch) change
test_shm_mq's worker.c to use the parallel.c wording also.

+SetCurrentStatementStartTimestamp();

Why do we need this?

+StartTransactionCommand();

Do we need a transaction?  If so, how about starting a new transaction
for each relation instead of using a single one for the entire
prewarm?

+if (status == BGWH_STOPPED)
+return;
+
+if (status == BGWH_POSTMASTER_DIED)
+{
+ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+  errmsg("cannot start bgworker autoprewarm without postmaster"),
+ errhint("Kill all remaining database processes and restart"
+ " the database.")));
+}
+
+Assert(0);

Instead of writing it this way, remove the first "if" block and change
the second one to Assert(status == BGWH_STOPPED).  Also, I'd ditch the
curly braces in this case.

+file = fopen(AUTOPREWARM_FILE, PG_BINARY_R);

Use AllocateFile() instead.  See the comments for that function and at
the top of fd.c.  Then you don't need to worry about leaking the file
handle on an error, so you can remove the fclose() before ereport()
stuff -- which is incomplete anyway, because you could fail e.g.
inside dsm_create().

+ errmsg("Error reading num of elements in \"%s\" for"
+" autoprewarm : %m", AUTOPREWARM_FILE)));

As I said in a previous review, please do NOT split error messages
across lines like this.  Also, this error message is nothing close to
PostgreSQL style. Please read
https://www.postgresql.org/docs/devel/static/error-style-guide.html
and learn to follow all those guidelines written therein.  I see at
least 3 separate problems with this message.

+num_elements = i;

I'd do something like if (i != num_elements) elog(ERROR, "autoprewarm
block dump has %d entries but expected %d", i, num_elements);  It
seems OK for this to be elog() rather than ereport() because the file
should never be corrupt unless the user has cheated by hand-editing
it.

I think you can get rid of next_db_pos altogether, and this
prewarm_elem thing too.  Design sketch:

1. Move all the stuff that's in prewarm_elem into
AutoPrewarmSharedState.  Rename start_pos to prewarm_start_idx and
end_of_blockinfos to prewarm_stop_idx.

2. Instead of computing all of the database ranges first and then
launching workers, do it all in one loop.  So figure out where the
records for the current database end and set prewarm_start_idx and
prewarm_end_idx appropriately.  Launch a worker.  When the worker
terminates, set prewarm_start_idx = prewarm_end_idx and advance
prewarm_end_idx to the end of the next database's records.

This saves having to allocate memory for the next_db_pos array, and it
also avoids this crock:

+memcpy(, MyBgworkerEntry->bgw_extra, sizeof(prewarm_elem));

The reason that's bad is because it only works so long as bgw_extra is
large enough to hold prewarm_elem.  If prewarm_elem grows or bgw_extra
shrinks, this turns into a buffer overrun.

I would use AUTOPREWARM_FILE ".tmp" rather than a name incorporating
the PID for the temporary file.  Otherwise, you might leave many
temporary files behind under different names.  If you use the same
name every time, you'll never have more than one, and the next
successful dump will end up getting rid of it along the way.

+

Re: [HACKERS] TAP backpatching policy

2017-05-31 Thread Joshua D. Drake

On 05/30/2017 09:52 PM, Stephen Frost wrote:

Tom,



Um ... but we still have 2 live pre-9.4 branches.  If your proposal
doesn't extend to back-porting all of this stuff as far as 9.2,
I don't see what this is really buying.  We'd still need version cutoff
checks in the tests.


I don't believe the explicit goal of this is to remove the version
checks but rather to provide improved testing coverage in our
back-branches.  If we have to keep a version cutoff check for that, so
be it.


+1




(If you *do* propose back-patching all this stuff as far as 9.2, I'm not
quite sure what I'd think about that.  But the proposal as stated seems
like questionable half measures.)


I find that to be an extremely interesting idea, for my own 2c, but I'm
not sure how practical it is.
It is perfectly reasonable to say, "We have added comprehensive testing 
through the TAP project. Unfortunately, it is only reasonable (due to 
code changes) to back port them from 9.4 and above."


*IF* we were to try to go back farther than 9.4, I would say that 9.3 is 
the only one that would be worth it anyway. 9.2 was a great release but 
it's day is over or at least it is a spectacle of a setting sun:


https://www.postgresql.org/support/versioning/

Thanks,

JD




--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Perfomance bug in v10

2017-05-31 Thread Teodor Sigaev

Hi!

I found an example where v10 chooses extremely non-optimal plan:
select
i::int as a,
i::int + 1 as b,
0 as c
into t
from
generate_series(1,32) as i;

create unique index i on t (c, a);

explain analyze
SELECT
t1.a, t1.b,
t2.a, t2.b,
t3.a, t3.b,
t4.a, t4.b,
t5.a, t5.b,
t6.a, t6.b
/*
,
t7.a, t7.b,
t8.a, t8.b,
t9.a, t9.b,
t10.a, t10.b
*/
FROM t T1
LEFT OUTER JOIN t T2
ON T1.b = T2.a AND T2.c = 0
LEFT OUTER JOIN t T3
ON T2.b = T3.a AND T3.c = 0
LEFT OUTER JOIN t T4
ON T3.b = T4.a AND T4.c = 0
LEFT OUTER JOIN t T5
ON T4.b = T5.a AND T5.c = 0
LEFT OUTER JOIN t T6
ON T5.b = T6.a AND T6.c = 0
LEFT OUTER JOIN t T7
ON T6.b = T7.a AND T7.c = 0
LEFT OUTER JOIN t T8
ON T7.b = T8.a AND T8.c = 0
LEFT OUTER JOIN t T9
ON T8.b = T9.a AND T9.c = 0
LEFT OUTER JOIN t T10
ON T9.b = T10.a AND T10.c = 0
WHERE T1.c = 0 AND T1.a = 5
;

It takes 4 seconds on my laptop, uncommenting commented lines causes run 
forever. analyzing table or removing index reduces execution time to 
milliseconds regardless on commented or uncommented lines.


The commit
commit 9c7f5229ad68d7e0e4dd149e3f80257893e404d4
Author: Tom Lane 
Date:   Fri Apr 7 22:20:03 2017 -0400

Optimize joins when the inner relation can be proven unique.

seems a root this problem - before it the query takes milliseconds. In 
attachment there is a output of explain analyze with commented lines, my 
attention was attracted by a huge number of loops:


 ->  Materialize  (cost=0.00..1.40 rows=1 width=8) (actual time=0.000..0.001 
rows=17 loops=1048576)




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
Timing is on.
DROP TABLE
Time: 5,268 ms
SELECT 32
Time: 5,515 ms
CREATE INDEX
Time: 14,971 ms
QUERY PLAN  
   
---
 Nested Loop Left Join  (cost=0.00..8.55 rows=1 width=48) (actual 
time=818.219..4159.317 rows=1 loops=1)
   Join Filter: (t1.b = t2.a)
   Rows Removed by Join Filter: 31
   ->  Seq Scan on t t1  (cost=0.00..1.48 rows=1 width=8) (actual 
time=0.026..0.032 rows=1 loops=1)
 Filter: ((c = 0) AND (a = 5))
 Rows Removed by Filter: 31
   ->  Nested Loop Left Join  (cost=0.00..7.06 rows=1 width=40) (actual 
time=10.588..4159.270 rows=32 loops=1)
 Join Filter: (t2.b = t3.a)
 Rows Removed by Join Filter: 993
 ->  Seq Scan on t t2  (cost=0.00..1.40 rows=1 width=8) (actual 
time=0.008..0.020 rows=32 loops=1)
   Filter: (c = 0)
 ->  Nested Loop Left Join  (cost=0.00..5.65 rows=1 width=32) (actual 
time=0.142..129.970 rows=32 loops=32)
   Join Filter: (t3.b = t4.a)
   Rows Removed by Join Filter: 993
   ->  Seq Scan on t t3  (cost=0.00..1.40 rows=1 width=8) (actual 
time=0.002..0.010 rows=32 loops=32)
 Filter: (c = 0)
   ->  Nested Loop Left Join  (cost=0.00..4.23 rows=1 width=24) 
(actual time=0.007..4.055 rows=32 loops=1024)
 Join Filter: (t4.b = t5.a)
 Rows Removed by Join Filter: 993
 ->  Seq Scan on t t4  (cost=0.00..1.40 rows=1 width=8) 
(actual time=0.002..0.010 rows=32 loops=1024)
   Filter: (c = 0)
 ->  Nested Loop Left Join  (cost=0.00..2.82 rows=1 
width=16) (actual time=0.003..0.121 rows=32 loops=32768)
   Join Filter: (t5.b = t6.a)
   Rows Removed by Join Filter: 528
   ->  Seq Scan on t t5  (cost=0.00..1.40 rows=1 
width=8) (actual time=0.002..0.009 rows=32 loops=32768)
 Filter: (c = 0)
   ->  Materialize  (cost=0.00..1.40 rows=1 width=8) 
(actual time=0.000..0.001 rows=17 loops=1048576)
 ->  Seq Scan on t t6  (cost=0.00..1.40 rows=1 
width=8) (actual time=0.008..0.031 rows=32 loops=1)
   Filter: (c = 0)
 Planning time: 3.316 ms
 Execution time: 4159.596 ms
(31 rows)

Time: 4165,372 ms (00:04,165)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GENERAL] pg_basebackup error: replication slot "pg_basebackup_2194" already exists

2017-05-31 Thread Magnus Hagander
On Wed, May 31, 2017 at 12:20 AM, Ludovic Vaugeois-Pepin <
ludovi...@gmail.com> wrote:

> On Tue, May 30, 2017 at 9:32 PM, Magnus Hagander 
> wrote:
> > On Tue, May 30, 2017 at 9:14 PM, Ludovic Vaugeois-Pepin
> >  wrote:
> >>
> >> I ran into the issue described below with 10.0 beta. The error I got is:
> >>
> >> pg_basebackup: could not create temporary replication slot
> >> "pg_basebackup_2194": ERROR:  replication slot "pg_basebackup_2194"
> >> already exists
> >>
> >> A race condition? Or maybe I am doing something wrong.
> >>
> >>
> >>
> >>
> >>
> >> Release:
> >> Name: postgresql10-server
> >> Version : 10.0
> >> Release : beta1PGDG.rhel7
> >>
> >>
> >> Test Type:
> >> Functional testing of a pacemaker resource agent
> >> (https://github.com/ulodciv/pgha)
> >>
> >>
> >> Test Detail:
> >> During context/environement setup, pg_basebackup is invoked (in
> >> parallel) from multiple virtual machines. The backups are then started
> >> as asynchronously replicated hot standbies.
> >>
> >>
> >> Platform:
> >> Centos 7.3
> >>
> >>
> >> Installation Method:
> >> yum -y install
> >>
> >> https://download.postgresql.org/pub/repos/yum/testing/10/
> redhat/rhel-7-x86_64/pgdg-redhat10-10-1.noarch.rpm
> >> yum -y install postgresql10-server postgresql10-contrib
> >>
> >>
> >> Platform Detail:
> >>
> >>
> >> Test Procedure:
> >>
> >> Have pg_basebackup run simultaneously on multiple hosts against
> >> the same instance eg:
> >>
> >> pg_basebackup -h test4 -p 5432 -D /var/lib/pgsql/10/data -U
> repl1
> >> -Xs
> >>
> >>
> >> Failure?
> >>
> >> E   deploylib.deployer_error.DeployerError:
> >> postgres@test5: got exit status 1 for:
> >> E   pg_basebackup -h test4 -p 5432 -D
> >> /var/lib/pgsql/10/data -U repl1 -Xs
> >> E   stderr: pg_basebackup: could not create temporary
> >> replication slot "pg_basebackup_2194": ERROR:  replication slot
> >> "pg_basebackup_2194" already exists
> >> E   pg_basebackup: child process exited with error 1
> >> E   pg_basebackup: removing data directory
> >> "/var/lib/pgsql/10/data"
> >>
> >>
> >> Test Results:
> >>
> >>
> >> Comments:
> >> This seems to be new with 10. I recently began testing the
> >> pacemaker resource agent against PG 10. I never had (or noticed) this
> >> failure with 9.6.1 and 9.6.2.
> >
> >
> > Hah, that's an interesting failure. In the name of the slot, the 2194
> comes
> > from the pid -- but it's the pid of pg_basebackup.
> >
> > I assume you're not running the two pg_basebackup processes on the same
> > machine? Is it predictable when this happens (meaning that the pid value
> is
> > actually predictable), or do you have to run it a large numbe rof times
> > before it happens?
>
>
> Indeed, I run it from two VMs that were created from the same .ova
> (packaged VM).
> I ran into this once, however I have been running tests on 10.0 for a
> couple of days or so.
>
> My guess is that the two hosts ended up using the same pid when
> running the backup.
>


Moving this one over to -hackers to discuss the fix, as this is clearly an
issue.

Right now, pg_basebackup will use the pid of the *client* process to
generate it's ephemeral slot name. Per this report that seems like it can
definitely be a problem.

One of my first thoughts would be to instead use the pid of the *server* to
do that, as this will be guaranteed to be unique. However, the client can't
access the pid of the server as it is now, and its the client that has to
create the name.

One way to do that would be to include the pid of the walsender backend in
the reply to IDENTIFY_SYSTEM, and then use that. What do people think of
that idea?

Other suggestions?

I will add this to the 10.0 open item lists.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] ALTER INDEX .. SET STATISTICS ... behaviour

2017-05-31 Thread Alexander Korotkov
On Wed, May 31, 2017 at 6:53 PM, Tom Lane  wrote:

> Alexander Korotkov  writes:
> > I've discovered that PostgreSQL is able to run following kind of queries
> in
> > order to change statistic-gathering target for an indexed expression.
>
> > ALTER INDEX index_name ALTER COLUMN expr SET STATISTICS stat_target;
>
> > It's been previously discussed in [1].
>
> > I think this should be fixed not just in docs.  This is why I've started
> > thread in pgsql-hackers.  For me usage of internal column names "expr",
> > "expr1", "expr2" etc. looks weird.  And I think we should replace it
> with a
> > better syntax.  What do you think about these options?
>
> > ALTER INDEX index_name ALTER EXPRESSION 0 SET STATISTICS stat_target; --
> > Refer expression by its number
> > ALTER INDEX index_name ALTER EXPRESSION (x + y) SET STATISTICS
> stat_target;
> > -- Refer expression by its definition
>
> Don't like either of those particularly, but what about just targeting
> a column by column number, independently of whether it's an expression?
>
> ALTER INDEX myindex ALTER COLUMN 4 SET STATISTICS 1000;
>

I completely agree with that.
If no objections, I will write a patch for that.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] TAP backpatching policy

2017-05-31 Thread Alvaro Herrera
Robert Haas wrote:
> On Tue, May 30, 2017 at 9:12 PM, Craig Ringer  wrote:
> > Thoughts? Backpatch new TAP methods, etc, into back branches routinely?
> 
> So, on the one hand, it is certainly useful to be able to commit tests
> to back-branches as well as to master, and it's hard to do that if the
> infrastructure isn't there.

Agreed.

> On the other hand, we tell our users that we only back-patch security
> and stability fixes.

I think being strict about what we backpatch for the production code is
a valued principle.  I am not sure that not backpatching new
test-harness features is valued in the same way.  One possible problem
is that the new tests cause test failures, and thus failure to create
packages for some platforms.  That would be a net loss.  But it won't
corrupt your data and it won't make your applications incompatible.

> It is useful to be able to show a customer a list of things that were
> done in a minor release and see nothing in there that can be described
> as optional tinkering.

In my experience, some customers are going to take our word that we've
done a good job keeping the branch clean of unnecessary changes, and
others are going to cherry-pick individual fixes regardless of what's in
there.  The percentages in each group might or might not change slightly
if they see some new Perl test code, but I doubt it'd change
dramatically.

My main concern is how widely is the buildfarm going to test the new
test frameworks.  If we backpatch PostgresNode-based tests to 9.2, are
buildfarm animals going to need to be reconfigured to use
--enable-tap-tests?  (9.2 and 9.3 do not have --enable-tap-tests) If the
old branches need to be reconfigured, then the tests are not going to
run on a majority of animals, and that is going to cause pain on obscure
platforms that three months from now we figure didn't get any buildfarm
coverage.  But if they start picking up the new tests without need for
human intervention, then the coverage should be decent enough that it
shouldn't be a problem.  So my vote is that if a majority of buildfarm
animals pick up PostgresNode tests without reconfiguration, +1 for
backpatching; otherwise -1.

Being unable to backpatch tests for bugfixes is a net loss, IMO.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] relocation truncated to fit: citus build failure on s390x

2017-05-31 Thread Alvaro Herrera
Christoph Berg wrote:
> Re: Tom Lane 2017-05-31 <28752.1496238...@sss.pgh.pa.us>

> > Next question: should we back-patch this change, or just do it in HEAD?
> 
> Debian "needs" it for 9.6, but I've already pushed the s390x patch in
> the original posting, so I could just live with it being just in head.
> But of course it would be nice to have everything in sync.

I think it's only a problem for you in 9.6-only because you've not tried
pglogical or some other large-shlib extension with earlier branches; in
other words, eventually this is going to bite somebody using the old
branches as well, unless we believe that the platforms are dead enough
that nobody really cares other than for academic purposes.

My vote would be to backpatch it all the way.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER INDEX .. SET STATISTICS ... behaviour

2017-05-31 Thread Tom Lane
Alexander Korotkov  writes:
> I've discovered that PostgreSQL is able to run following kind of queries in
> order to change statistic-gathering target for an indexed expression.

> ALTER INDEX index_name ALTER COLUMN expr SET STATISTICS stat_target;

> It's been previously discussed in [1].

> I think this should be fixed not just in docs.  This is why I've started
> thread in pgsql-hackers.  For me usage of internal column names "expr",
> "expr1", "expr2" etc. looks weird.  And I think we should replace it with a
> better syntax.  What do you think about these options?

> ALTER INDEX index_name ALTER EXPRESSION 0 SET STATISTICS stat_target; --
> Refer expression by its number
> ALTER INDEX index_name ALTER EXPRESSION (x + y) SET STATISTICS stat_target;
> -- Refer expression by its definition

Don't like either of those particularly, but what about just targeting
a column by column number, independently of whether it's an expression?

ALTER INDEX myindex ALTER COLUMN 4 SET STATISTICS 1000;

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ALTER INDEX .. SET STATISTICS ... behaviour

2017-05-31 Thread Alexander Korotkov
Hackers,

I've discovered that PostgreSQL is able to run following kind of queries in
order to change statistic-gathering target for an indexed expression.

ALTER INDEX index_name ALTER COLUMN expr SET STATISTICS stat_target;

It's been previously discussed in [1].

I think this should be fixed not just in docs.  This is why I've started
thread in pgsql-hackers.  For me usage of internal column names "expr",
"expr1", "expr2" etc. looks weird.  And I think we should replace it with a
better syntax.  What do you think about these options?

ALTER INDEX index_name ALTER EXPRESSION 0 SET STATISTICS stat_target; --
Refer expression by its number
ALTER INDEX index_name ALTER EXPRESSION (x + y) SET STATISTICS stat_target;
-- Refer expression by its definition

1.
https://www.postgresql.org/message-id/flat/20150716143149.GO2301%40postgresql.org

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] quiet conversion warning in DatumGetFloat4

2017-05-31 Thread Tom Lane
Chapman Flack  writes:
> On 05/31/17 01:26, Tom Lane wrote:
>> Hm.  I think it would be better to use DatumGetInt32 here.  Arguably,
>> direct use of GET_4_BYTES and its siblings should only appear in
>> DatumGetFoo macros.

> Like so? These are the 4 sites where {GET,SET}_n_BYTES got introduced
> in 14cca1b (for consistency, though only the GET_4 case produces warnings).

After experimenting with -Wconversion, I see why we don't use it in
server builds --- it produces an astonishing number of mostly-useless
warnings, which apparently can only be silenced by introducing explicit
casts.  I do not think that cluttering our code with lots more explicit
casts would be a win for either readability or safety.

However, I grant your point that some extensions may have outside
constraints that mandate using -Wconversion, so to the extent that
we can keep key headers like postgres.h from triggering those warnings,
it's probably worth doing.  I suspect you're still seeing a lot of them
though --- experiments with some contrib modules suggest that a lot of
our other headers also contain code that would trigger them.  I do not
think I'd be on board with trying to silence them generally.

However, the present patch seems harmless enough, and arguably a tad
cleaner than what we had, so pushed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use of non-restart-safe storage by temp_tablespaces

2017-05-31 Thread Mark Dilger

> On May 31, 2017, at 7:58 AM, Bruce Momjian  wrote:
> 
> On Wed, May 31, 2017 at 07:53:22AM -0700, Mark Dilger wrote:
>>> Just to clarify, the TEMPORARY clause would allow the tablespace to
>>> start up empty, while normal tablespaces can't do that, right?  One big
>>> problem is that we don't have any way to prevent non-temporary
>>> tablespaces from being created on transient storage.  I wonder if we
>>> should document this restriction, but it seems awkward to do.
>> 
>> It depends what you mean by allowing the tablespace to start up empty.
>> It must not be empty once users can run queries, since the catalogs will 
>> still
>> have entries for the tablespace and its dependent objects.  So, what must
>> happen is that during startup the tablespace and its temporary tables and
>> indexes get recreated if they are missing.
> 
> Uh, I thought only the sessions that created the temporary objects could
> see them, and since they are not in WAL and autovacuum can't see them,
> their non-existence in a temporary tablespace would not be a problem.

You are correct.  I was thinking about an extension to allow unlogged
tablespaces on temporary filesystems, but got the words "unlogged" and
"temporary" mixed up in my thinking and in what I wrote.  I should have
written that unlogged tablespaces would only host unlogged tables and
unlogged indexes, such that users are not surprised to find their data
missing.

On reflection, I think both features are worthwhile, and not at all exclusive
of each other, though unlogged tablespaces is probably considerably more
work to implement.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Segmentation fault when creating a BRIN, 10beta1

2017-05-31 Thread Alvaro Herrera
Andres Freund wrote:
> On 2017-05-30 18:21:10 -0400, Alvaro Herrera wrote:
> > Alexander Sosna wrote:
> > > Hi,
> > > 
> > > I can reproduce a segmentation fault when creating a BRIN concurrently
> > > with set pages_per_range and autosummarize.
> > 
> > Pushed fix just now.  Please give it a try.  Thanks for testing and
> > reporting,
> 
> Shouldn't this have been uncovered by a regression test? In other words,
> do I understand correctly that the new summarization stuff is largely
> not tested by regression tests?

You understand correctly and I had the same concern myself upon finding
about the bug.  I cannot promise to create one right now but I can do so
after June 12th.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-05-31 Thread Dilip Kumar
On Wed, May 31, 2017 at 10:57 AM, Robert Haas  wrote:
>> Simplehash provides an option to provide your own allocator function
>> to it. So in the allocator function, you can allocate memory from DSA.
>> After it reaches some threshold it expands the size (double) and it
>> will again call the allocator function to allocate the bigger memory.
>> You can refer pagetable_allocate in tidbitmap.c.
>
> That only allows the pagetable to be shared, not the hash table itself.

I agree with you. But, if I understand the use case correctly we need
to store the TupleDesc for the RECORD in shared hash so that it can be
shared across multiple processes.  I think this can be achieved with
the simplehash as well.

For getting this done, we need some fixed shared memory for holding
static members of SH_TYPE and the process which creates the simplehash
will be responsible for copying these static members to the shared
location so that other processes can access the SH_TYPE.  And, the
dynamic part (the actual hash entries) can be allocated using DSA by
registering SH_ALLOCATE function.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] relocation truncated to fit: citus build failure on s390x

2017-05-31 Thread Christoph Berg
Re: Tom Lane 2017-05-31 <28752.1496238...@sss.pgh.pa.us>
> OK, this looks good to me.  Just to make sure everyone's on the
> same page, what I propose to do is simplify all our platform-specific
> Makefiles that use either -fpic or -fPIC to use -fPIC unconditionally.
> This affects the netbsd, linux, and openbsd ports.  Looks like we should
> also change the example link commands in dfunc.sgml.

Ack.

> Next question: should we back-patch this change, or just do it in HEAD?

Debian "needs" it for 9.6, but I've already pushed the s390x patch in
the original posting, so I could just live with it being just in head.
But of course it would be nice to have everything in sync.

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use of non-restart-safe storage by temp_tablespaces

2017-05-31 Thread Bruce Momjian
On Wed, May 31, 2017 at 07:53:22AM -0700, Mark Dilger wrote:
> > Just to clarify, the TEMPORARY clause would allow the tablespace to
> > start up empty, while normal tablespaces can't do that, right?  One big
> > problem is that we don't have any way to prevent non-temporary
> > tablespaces from being created on transient storage.  I wonder if we
> > should document this restriction, but it seems awkward to do.
> 
> It depends what you mean by allowing the tablespace to start up empty.
> It must not be empty once users can run queries, since the catalogs will still
> have entries for the tablespace and its dependent objects.  So, what must
> happen is that during startup the tablespace and its temporary tables and
> indexes get recreated if they are missing.

Uh, I thought only the sessions that created the temporary objects could
see them, and since they are not in WAL and autovacuum can't see them,
their non-existence in a temporary tablespace would not be a problem.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-05-31 Thread Robert Haas
On Tue, May 30, 2017 at 2:45 AM, Dilip Kumar  wrote:
> On Tue, May 30, 2017 at 1:09 AM, Thomas Munro
>  wrote:
>>> * Perhaps simplehash + an LWLock would be better than dht, but I
>>> haven't looked into that.  Can it be convinced to work in DSA memory
>>> and to grow on demand?
>
> Simplehash provides an option to provide your own allocator function
> to it. So in the allocator function, you can allocate memory from DSA.
> After it reaches some threshold it expands the size (double) and it
> will again call the allocator function to allocate the bigger memory.
> You can refer pagetable_allocate in tidbitmap.c.

That only allows the pagetable to be shared, not the hash table itself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use of non-restart-safe storage by temp_tablespaces

2017-05-31 Thread Mark Dilger

> On May 31, 2017, at 6:04 AM, Bruce Momjian  wrote:
> 
> On Wed, May 31, 2017 at 07:53:53AM -0400, Robert Haas wrote:
>> On Tue, May 30, 2017 at 6:50 PM, Mark Dilger  wrote:
 On May 29, 2017, at 11:53 AM, Bruce Momjian  wrote:
 Right now we don't document that temp_tablespaces can use
 non-restart-safe storage, e.g. /tmp, ramdisks.  Would this be safe?
 Should we document this?
>>> 
>>> The only safe way to do temporary tablespaces that I have found is to extend
>>> the grammar to allow CREATE TEMPORARY TABLESPACE, and then refuse
>>> to allow the creation of any non-temporary table (or index on same) in that
>>> tablespace.  Otherwise, it is too easy to be surprised to discover that your
>>> table contents have gone missing.
>> 
>> I think this would be a sensible approach.
> 
> Just to clarify, the TEMPORARY clause would allow the tablespace to
> start up empty, while normal tablespaces can't do that, right?  One big
> problem is that we don't have any way to prevent non-temporary
> tablespaces from being created on transient storage.  I wonder if we
> should document this restriction, but it seems awkward to do.

It depends what you mean by allowing the tablespace to start up empty.
It must not be empty once users can run queries, since the catalogs will still
have entries for the tablespace and its dependent objects.  So, what must
happen is that during startup the tablespace and its temporary tables and
indexes get recreated if they are missing.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE

2017-05-31 Thread Robert Haas
On Wed, May 31, 2017 at 2:35 AM, Ashutosh Bapat
 wrote:
> AFAIK, work_mem comes from memory private to the process whereas this
> memory will come from the shared memory pool.

I don't think that really matters.  The point of limits like work_mem
is to avoid driving the machine into swap.  Allocating shared memory
might error out rather than causing swapping in some cases on some
systems, but any difference between private and shared memory is not
the real issue here.  The issue is overall memory consumption.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP backpatching policy

2017-05-31 Thread Robert Haas
On Tue, May 30, 2017 at 9:12 PM, Craig Ringer  wrote:
> Thoughts? Backpatch new TAP methods, etc, into back branches routinely?

So, on the one hand, it is certainly useful to be able to commit tests
to back-branches as well as to master, and it's hard to do that if the
infrastructure isn't there.

On the other hand, we tell our users that we only back-patch security
and stability fixes.  When customers start to doubt that, then they
become reluctant to apply new minor versions in their entirety and ask
for individual commits to be cherry-picked, or just don't upgrade at
all.  One could argue that commits to the testing framework shouldn't
make customers nervous, but what people should be worried about and
what they are actually worried about do not always match.  It is
useful to be able to show a customer a list of things that were done
in a minor release and see nothing in there that can be described as
optional tinkering.

The TAP tests seem to be evolving incredibly quickly, with new
infrastructure being built out all the time.  That strengths both the
argument for back-patching (to keep things in sync) and for not
back-patching (to avoid churning a supposedly-stable branch).  I'm not
sure exactly what I think about this issue, but I'm very skeptical of
the idea that back-patching less conservatively will have no
downsides.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - still unstable after all these months

2017-05-31 Thread Erik Rijkers

On 2017-05-31 11:16, Petr Jelinek wrote:
[...]

Thanks to Mark's offer I was able to study the issue as it happened and
found the cause of this.

[0001-Improve-handover-logic-between-sync-and-apply-worker.patch]


This looks good:

-- out_20170531_1141.txt
100 -- pgbench -c 90 -j 8 -T 60 -P 12 -n   --  scale 25
100 -- All is well.

So this is 100x a 1-minute test with 100x success. (This on the most 
fastidious machine (slow disks, meagre specs) that used to give 15% 
failures)


I'll let it run for a couple of days with varying params (and on varying 
hardware) but it definitely does look as if you fixed it.


Thanks!

Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-31 Thread Kevin Grittner
On Wed, May 31, 2017 at 1:44 AM, Noah Misch  wrote:

> IMMEDIATE ATTENTION REQUIRED.

I should be able to complete review and testing by Friday.  If there
are problems I might not take action until Monday; otherwise I
should be able to do so on Friday.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error while creating subscription when server is running in single user mode

2017-05-31 Thread Dilip Kumar
On Wed, May 31, 2017 at 7:54 AM, tushar  wrote:
> centos@centos-cpula bin]$ ./postgres --single postgres -D m1data
> PostgreSQL stand-alone backend 10beta1
> backend> create subscription sub connection 'dbname=postgres port=5433
> user=centos' publication p with (create_slot=0,enabled=off);
> 2017-05-31 12:53:09.318 BST [10469] LOG:  statement: create subscription sub
> connection 'dbname=postgres port=5433 user=centos' publication p with
> (create_slot=0,enabled=off);
>
> 2017-05-31 12:53:09.326 BST [10469] ERROR:  epoll_ctl() failed: Bad file
> descriptor

IMHO, In single user mode, it can not support replication (it can not
have background WALReciver task). However, I believe there should be a
proper error if the above statement is correct.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] relocation truncated to fit: citus build failure on s390x

2017-05-31 Thread Tom Lane
Christoph Berg  writes:
> Re: Tom Lane 2017-05-30 <1564.1496176...@sss.pgh.pa.us>
>> It'd be interesting if people could gather similar numbers on other
>> platforms of more real-world relevance, such as ppc64.  But based on
>> this small sample, I wouldn't object to just going to -fPIC across
>> the board.

> [ more numbers ]

OK, this looks good to me.  Just to make sure everyone's on the
same page, what I propose to do is simplify all our platform-specific
Makefiles that use either -fpic or -fPIC to use -fPIC unconditionally.
This affects the netbsd, linux, and openbsd ports.  Looks like we should
also change the example link commands in dfunc.sgml.

Next question: should we back-patch this change, or just do it in HEAD?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create subscription with `create_slot=false` and incorrect slot name

2017-05-31 Thread Robert Haas
On Tue, May 30, 2017 at 3:01 PM, Peter Eisentraut
 wrote:
> On 5/25/17 17:26, Peter Eisentraut wrote:
>> Another way to fix this particular issue is to not verify the
>> replication slot name before doing the drop.  After all, if the name is
>> not valid, then you can also just report that it doesn't exist.
>
> Here is a possible patch along these lines.

I don't see how this solves the problem.  Don't you still end up with
an error message telling you that you can't drop the subscription, and
no guidance as to how to fix it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >