Re: [GENERAL] Logical decoding client has the power to crash the server

2017-09-20 Thread Michael Paquier
On Thu, Sep 21, 2017 at 1:09 PM, Meel Velliste  wrote:
> In this situation, neither us, nor our customer has the power to install the
> required monitoring of pg_xlog. The database hosting provider would have to
> do it. In most cases (e.g. Amazon RDS) the hosting provider does provide a
> way of monitoring overall disk usage, which may be good enough. But I am
> thinking it would make sense for postgres to have default, built-in
> monitoring that drops all the slots when pg_xlog gets too full (based on
> some configurable limit). Otherwise everybody has to build their own
> monitoring and I imagine 99% of them would want the same behavior. Nobody
> wants their database to fail just because some client was not reading the
> slot.

(Please avoid top-posting, this breaks the logic of the thread and
this is contrary to the practices of the Postgres mailing lists)

Note that on-disk lookup is not strictly necessary. If you know
max_wal_size, pg_current_wal_lsn (or pg_last_wal_receive_lsn if
working on a standby) and the restart_lsn of the slots that's enough.
If you don't have privileges sufficient to see that, well I guess that
you will need to review the access permissions to your instance.
Postgres 9.6 offers better access control to system functions, so you
could be granted access to just those resources to be fine using a SQL
session.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Logical decoding client has the power to crash the server

2017-09-20 Thread Meel Velliste
Hi Michael,

Thank you, I appreciate your response. Now that you mention, I am realizing
that I don't really care about dropping the oldest log entries. Mandatory
monitoring makes a lot of sense and dropping the entire slot would be
perfect when it consumes too much space.

The only problem with monitoring is that I may have no control over it. My
use case is complicated by the fact that there are three parties:
1) Our customer who has admin privileges on the database
2) Us with limited privileges
3) The database hosting provider who restricts access to the underlying OS
and file system

In this situation, neither us, nor our customer has the power to install
the required monitoring of pg_xlog. The database hosting provider would
have to do it. In most cases (e.g. Amazon RDS) the hosting provider does
provide a way of monitoring overall disk usage, which may be good enough.
But I am thinking it would make sense for postgres to have default,
built-in monitoring that drops all the slots when pg_xlog gets too full
(based on some configurable limit). Otherwise everybody has to build their
own monitoring and I imagine 99% of them would want the same behavior.
Nobody wants their database to fail just because some client was not
reading the slot.

In our case, if we lose access to the customer's database, if they did not
install monitoring (even though we told them to), their disk will fill up
and they will blame us for crashing their database. It ends up being a
classic case of finger pointing between multiple parties. This has not
happened yet but I am sure it is just a matter of time. I would really like
to see a default, built-in circuit breaker in postgres to prevent this.

Another bit of context here is that the logical decoding is of secondary
importance to our customers, but their postgres database itself is
absolutely mission critical.

Thanks,

Meel

On Wed, Sep 20, 2017 at 12:43 AM Michael Paquier 
wrote:

> On Wed, Sep 20, 2017 at 3:14 PM, Meel Velliste  wrote:
> > From what I understand about logical decoding, there is no limit to how
> many
> > log entries will be retained by the server if nobody reads them from the
> > logical slot. This means that a client that fails to read from the slot
> has
> > the power to bring down the master database because the server's disk
> will
> > get full at which point all subsequent write operations will fail and
> even
> > read operations will fail because they too need temporary space. Even the
> > underlying operating system may be affected as it too may need temporary
> > disk space to carry out its basic functions.
>
> Monitoring is a mandatory part of the handling of replication slots.
> One possible solution is to use a background worker that scans slots
> causing bloat in pg_xlog and to automatically get rid of them so as
> the primary is preserved from any crash. Note that advancing a slot is
> doable for a physical slot, but advancing a logical slot is trickier
> (not sure if that's doable actually but Andres can comment on that)
> because it involves being sure that the catalog_xmin is still
> preserved so as past logical changes can be looked at consistently.
> --
> Michael
>


Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-20 Thread Michael Paquier
On Wed, Sep 20, 2017 at 5:45 PM, Albe Laurenz  wrote:
> Thomas Güttler wrote:
>> We run a PostgreSQL 9.6 server in a virtual machine.
>>
>> The virtual machine is managed by the customer.
>>
>> He does backup the VM.
>>
>> Is this enough, is this safe?
>
> I don't know about VMware,

I heard about one or two things :)

> but the general rule is that
> if the backup is truly atomic (it is guaranteed to capture
> a consistent state of the file system), you can use it
> to backup the database.

There are two types of snapshots: quiesced and non-quiesced. You
definitely want a quiesced snapshot when taking a backup so as the
system gets into a consistent state when working on it. There should
be an option related to that on the vSphere client managing the VM, so
make sure that quiesced is enabled.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Puzzled by UNION with unknown types

2017-09-20 Thread Ken Tanzer
Thanks Tom!

-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://agency-software.org/demo/client
*
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Re: [GENERAL] Up to date conventional wisdom re max shared_buffer size?

2017-09-20 Thread Jerry Sievers
Andres Freund  writes:

> On 2017-09-20 13:00:34 -0500, Jerry Sievers wrote:
>
>> >> Pg 9.3 on monster 2T/192 CPU Xenial thrashing
>> >
>> > Not sure what the word "thrashing" in that sentence means.
>> 
>> Cases of dozens or hundreds of sessions running typical statements for
>> this system but running 100% on their CPUs.  Seems to be triggered by
>> certain heavy weight batch jobs kicking off on this generally OLTP
>> system.
>> 
>> ISTM there might be LW lock contention happening around some sort of
>> shared resource where the lock wait implementation is a CPU spinner.
>
> Yes, we improved that a lot in 9.5, 9.6 and 10.  The really bad
> scenarios - I've seen 95% cpu time spent in locking - should all be
> fixed.

Yup, as I suspected from studying the rel notes but some shops including
ours are slow getting enough app dev and QA resources aligned for DB
system major upgrades thus we have ended up doing them 2 versions at a
time prior and this time 3 versions up :-(

>
> I'd try to make sure that both transparent hugepages and zone reclaim
> mode are disabled - the latter probably is already, but the former might
> still cause some problems.

Ack.  We'll research this.

>
>
>> > Things have improved a lot since 9.3 WRT to scalability, so I'd not
>> > infer too much from 9.3 performance on a larger box.
>> 
>> Understood.  The situation got worse when we moved to the even bigger
>> box also running a 4.x kernel which I presume was no where near existent
>> when 9.3 was our current Pg version.
>
> I suspect it's more the bigger box than the newer kernel. The more
> sockets and cores you have, the more lock contention bites you. That's
> because inter-socket / cpu transfers get more expensive with more cores.
>

Ack

>
>> >> Upgrade pending but we recently started having $interesting performance
>> >> issues at times looking like I/O slowness and other times apparently
>> >> causing CPU spins.
>> >
>> > That's not something we can really usefully comment on given the amount
>> > of information.
>> 
>> Ack'd.
>> 
>> I'd like to strace some of the spinning backends when/if we get another
>> opportunity to observe the problem to see if by syscall or libfunc name
>> we can learn more about what's the cause.
>
> I think the causes are known, and fixed - don't think there's much you
> can do besides upgrading, unless you want to backport a number of
> complex patches yourself.
>
> FWIW, usually perf gives better answers than strace in this type of
> scenario.

>
>
>> >> Anyway, shared_buffer coherency generally high but does take big dips
>> >> that are sometimes sustained for seconds or even minutes.
>> >
>> > "shared_buffer coherency"?
>> 
>> As measured querying pg_stat_databases and comparing total reads to read
>> hits.  Run frequently such as once /5-seconds and factored into a hit
>> percentage.  May stay up around 100% for several ticks but then go way
>> down which may or not sustain.
>> 
>> This is an OLTP app using Rails with hundreds of tables both trivial
>> n structure as well as having partitions, large payloads... TOAST and
>> the like.
>> 
>> TPS can measure in the ~5-10k range.
>
> That's cache hit rate, not coherency ;)

My bad

Thanks again.

>
> - Andres

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Up to date conventional wisdom re max shared_buffer size?

2017-09-20 Thread Jerry Sievers
Scott Marlowe  writes:

> On Tue, Sep 19, 2017 at 4:00 PM, Jerry Sievers  wrote:
>
>> Briefly, just curious if legacy max values for shared_buffers have
>> scaled up since 8G was like 25% of RAM?
>>
>> Pg 9.3 on monster 2T/192 CPU Xenial thrashing
>>
>> Upgrade pending but we recently started having $interesting performance
>> issues at times looking like I/O slowness and other times apparently
>> causing CPU spins.
>
> Have you looked at things like zone reclaim mode and transparent huge
> pages? Both of those can cause odd problems. Also it's usually a good
> idea to turn off swap as the linux kernel, presented with lots of ram
> and a small (by comparison) swap file sometimes makes bad life choices
> and starts using swap for things like storing currently unused shared
> buffers or something.

Not sure but we're checking into these items.  Thanks


-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Any known issues Pg 9.3 on Ubuntu Xenial kernel 4.4.0?

2017-09-20 Thread Jerry Sievers
Scott Marlowe  writes:

> On Wed, Sep 20, 2017 at 12:14 PM, Jerry Sievers  
> wrote:
>
>> Basically as per $subject.
>>
>> We took a perf hit moving up to newer hardware and OS version which
>> might in some cases be OK but admittedly there is some risk running a
>> much older app (Pg 9.3) on a kernel/OS version that nowhere near existed
>> when 9.3 was current.
>
> Are you sure you're using the same locale etc as you were on the old
> db? The most common cause of performance loss when migrating is that
> the new db uses a locale like en_US while the old one might have been
> in locale=C

No change there.   enUS-utf8 in both cases.

Thanks


>
>>
>> Be curious to hear of issues encountered and particular to eager to know
>> if disabling any kernel 4.x features helped.
>>
>> Thanks
>>
>>  PostgreSQL 9.3.19 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 
>> 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit
>>
>>
>> $ uname -a
>> Linux foo.somehost.com 4.4.0-92-generic #115-Ubuntu SMP Thu Aug 10 09:04:33 
>> UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>>
>> --
>> Jerry Sievers
>> Postgres DBA/Development Consulting
>> e: postgres.consult...@comcast.net
>> p: 312.241.7800
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
> -- 
> To understand recursion, one must first understand recursion.

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


R: [GENERAL] Insert large number of records

2017-09-20 Thread Job

> Even better would be if your bulkload could already be organised such
> that all the data in the "temporary" table can indiscriminately be
> inserted into the same target partition. That though depends a bit on
> your setup - at some point the time saved at one end gets consumed on
> the other or it takes even longer there.

Thank your for the answers and the ideas, really!

We wrote a simple script that split datas directly into the right partition,  
avoidind any trigger.
We also split into 100k-record portions.

Now performances have really improved, thanks to everybody!

One further question: within a query launched on the MASTER table where i need 
to scan every table, for exaple to search rows locatd in more partitions.
In there a way to improve "parallel scans" between more table at the same time 
or not?
I noticed, with explain analyze, the scan in the master table is Always 
sequential, descending into the partitions.

Thank you again,
F



Da: Alban Hertroys [haram...@gmail.com]
Inviato: mercoledì 20 settembre 2017 17.50
A: Job
Cc: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Insert large number of records

On 20 September 2017 at 07:42, Job  wrote:
> We use a "temporary" table, populated by pg_bulkload - it takes few minutes 
> in this first step.
> Then, from the temporary table, datas are transferred by a trigger that copy 
> the record into the production table.
> But *this step* takes really lots of time (sometimes also few hours).
> There are about 10 millions of record.

Perhaps the problem isn't entirely on the writing end of the process.

How often does this trigger fire? Once per row inserted into the
"temporary" table, once per statement or only after the bulkload has
finished?

Do you have appropriate indices on the temporary table to guarantee
quick lookup of the records that need to be copied to the target
table(s)?

> We cannot use pg_bulkload to load directly data into production table since 
> pg_bulkload would lock the Whole table, and "COPY" command is slow and would 
> not care about table partitioning (COPY command fire partitioned-table 
> triggers).

As David already said, inserting directly into the appropriate
partition is certainly going to be faster. It removes a check on your
partitioning conditions from the query execution plan; if you have
many partitions, that adds up, because the database needs to check
that condition among all your partitions for every row.

Come to think of it, I was assuming that the DB would stop checking
other partitions once it found a suitable candidate, but now I'm not
so sure it would. There may be good reasons not to stop, for example
if we can partition further into sub-partitions. Anybody?


Since you're already using a trigger, it would probably be more
efficient to query your "temporary" table for batches belonging to the
same partition and insert those into the partition directly, one
partition at a time.

Even better would be if your bulkload could already be organised such
that all the data in the "temporary" table can indiscriminately be
inserted into the same target partition. That though depends a bit on
your setup - at some point the time saved at one end gets consumed on
the other or it takes even longer there.

Well, I think I've thrown enough ideas around for now ;)

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Inserting millions of record in a partitioned Table

2017-09-20 Thread Rob Sargent


On 09/20/2017 02:46 PM, Vick Khera wrote:
On Wed, Sep 20, 2017 at 10:10 AM, Job > wrote:


We noticed that if we import directly into the global table it is
really, really slow.
Importing directly in the single partition is faster.


Do you have a rule or trigger on the main table to redirect to the 
partitions? You should expect that to take some extra time *per row*. 
Your best bet is to just import into the proper partition and make 
sure your application produces batch files that align with your 
partitions.


Either that or write a program that reads the data, determines the 
partition, and then inserts directly to it. It might be faster.


I wonder if this is a case of hurry up and wait.  A script which could 
load say 10 records, and assuming that takes much less than one second, 
run once per second (waiting 1000 - runtime ms) would by now have done 
about a million records since the question was asked.


Re: [GENERAL] Inserting millions of record in a partitioned Table

2017-09-20 Thread Vick Khera
On Wed, Sep 20, 2017 at 10:10 AM, Job  wrote:

> We noticed that if we import directly into the global table it is really,
> really slow.
> Importing directly in the single partition is faster.
>
>
Do you have a rule or trigger on the main table to redirect to the
partitions? You should expect that to take some extra time *per row*. Your
best bet is to just import into the proper partition and make sure your
application produces batch files that align with your partitions.

Either that or write a program that reads the data, determines the
partition, and then inserts directly to it. It might be faster.


Re: [GENERAL] Puzzled by UNION with unknown types

2017-09-20 Thread Tom Lane
Ken Tanzer  writes:
> Hi.  I've solved my practical problems, and I know unknown types are
> just bad, but am still curious about why some of these cause errors,
> and others don't.

> ag_consulting=> SELECT * FROM (SELECT '1'::unknown) foo UNION ALL
> SELECT * FROM (SELECT '1'::unknown) bar;
> ERROR:  failed to find conversion function from unknown to text

FWIW, this succeeds in 9.5 and up, as a result of more aggressive
application of the rule "coerce an unknown value to text if we cannot
find any other interpretation".

> ag_consulting=> CREATE TEMP TABLE t1 AS SELECT '1'::unknown; \d t1
> WARNING:  column "unknown" has type "unknown"
> DETAIL:  Proceeding with relation creation anyway.
> SELECT 1
>  Table "pg_temp_2.t1"
>  Column  |  Type   | Modifiers
> -+-+---
>  unknown | unknown |

As of v10, this will produce a table with a column of type text,
not type unknown, again as a result of more aggressively forcing
unknown to be something else.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Up to date conventional wisdom re max shared_buffer size?

2017-09-20 Thread Ron Johnson

On 09/20/2017 01:05 PM, Jerry Sievers wrote:

Ron Johnson  writes:


On 09/19/2017 05:00 PM, Jerry Sievers wrote:
[snip]


The DB is 10TB total size with OLTP plus some occasional heavy batching
which frequently correlates with degradation that requires intervention.

Unrelated server problem forced us to relocate from a Debian/Wheezy 3.x
kernel 1T 144 CPU to the even bigger box mentioned earlier.  And we wen
up a major kernel version also in the process.

How did you backup/restore a 10TB db?

We just relocated the SAN volume.  Takes about 1 minute :-)


Ah, yes.  Major *kernel* version.  Never mind...  :)

--
World Peace Through Nuclear Pacification



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Up to date conventional wisdom re max shared_buffer size?

2017-09-20 Thread Scott Marlowe
On Tue, Sep 19, 2017 at 4:00 PM, Jerry Sievers  wrote:
> Briefly, just curious if legacy max values for shared_buffers have
> scaled up since 8G was like 25% of RAM?
>
> Pg 9.3 on monster 2T/192 CPU Xenial thrashing
>
> Upgrade pending but we recently started having $interesting performance
> issues at times looking like I/O slowness and other times apparently
> causing CPU spins.

Have you looked at things like zone reclaim mode and transparent huge
pages? Both of those can cause odd problems. Also it's usually a good
idea to turn off swap as the linux kernel, presented with lots of ram
and a small (by comparison) swap file sometimes makes bad life choices
and starts using swap for things like storing currently unused shared
buffers or something.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Any known issues Pg 9.3 on Ubuntu Xenial kernel 4.4.0?

2017-09-20 Thread Scott Marlowe
On Wed, Sep 20, 2017 at 12:14 PM, Jerry Sievers  wrote:
> Basically as per $subject.
>
> We took a perf hit moving up to newer hardware and OS version which
> might in some cases be OK but admittedly there is some risk running a
> much older app (Pg 9.3) on a kernel/OS version that nowhere near existed
> when 9.3 was current.

Are you sure you're using the same locale etc as you were on the old
db? The most common cause of performance loss when migrating is that
the new db uses a locale like en_US while the old one might have been
in locale=C

>
> Be curious to hear of issues encountered and particular to eager to know
> if disabling any kernel 4.x features helped.
>
> Thanks
>
>  PostgreSQL 9.3.19 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 
> 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit
>
>
> $ uname -a
> Linux foo.somehost.com 4.4.0-92-generic #115-Ubuntu SMP Thu Aug 10 09:04:33 
> UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>
> --
> Jerry Sievers
> Postgres DBA/Development Consulting
> e: postgres.consult...@comcast.net
> p: 312.241.7800
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



-- 
To understand recursion, one must first understand recursion.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Up to date conventional wisdom re max shared_buffer size?

2017-09-20 Thread Andres Freund
On 2017-09-20 13:00:34 -0500, Jerry Sievers wrote:
> >> Pg 9.3 on monster 2T/192 CPU Xenial thrashing
> >
> > Not sure what the word "thrashing" in that sentence means.
> 
> Cases of dozens or hundreds of sessions running typical statements for
> this system but running 100% on their CPUs.  Seems to be triggered by
> certain heavy weight batch jobs kicking off on this generally OLTP
> system.
> 
> ISTM there might be LW lock contention happening around some sort of
> shared resource where the lock wait implementation is a CPU spinner.

Yes, we improved that a lot in 9.5, 9.6 and 10.  The really bad
scenarios - I've seen 95% cpu time spent in locking - should all be
fixed.

I'd try to make sure that both transparent hugepages and zone reclaim
mode are disabled - the latter probably is already, but the former might
still cause some problems.


> > Things have improved a lot since 9.3 WRT to scalability, so I'd not
> > infer too much from 9.3 performance on a larger box.
> 
> Understood.  The situation got worse when we moved to the even bigger
> box also running a 4.x kernel which I presume was no where near existent
> when 9.3 was our current Pg version.

I suspect it's more the bigger box than the newer kernel. The more
sockets and cores you have, the more lock contention bites you. That's
because inter-socket / cpu transfers get more expensive with more cores.


> >> Upgrade pending but we recently started having $interesting performance
> >> issues at times looking like I/O slowness and other times apparently
> >> causing CPU spins.
> >
> > That's not something we can really usefully comment on given the amount
> > of information.
> 
> Ack'd.
> 
> I'd like to strace some of the spinning backends when/if we get another
> opportunity to observe the problem to see if by syscall or libfunc name
> we can learn more about what's the cause.

I think the causes are known, and fixed - don't think there's much you
can do besides upgrading, unless you want to backport a number of
complex patches yourself.

FWIW, usually perf gives better answers than strace in this type of
scenario.


> >> Anyway, shared_buffer coherency generally high but does take big dips
> >> that are sometimes sustained for seconds or even minutes.
> >
> > "shared_buffer coherency"?
> 
> As measured querying pg_stat_databases and comparing total reads to read
> hits.  Run frequently such as once /5-seconds and factored into a hit
> percentage.  May stay up around 100% for several ticks but then go way
> down which may or not sustain.
> 
> This is an OLTP app using Rails with hundreds of tables both trivial
> n structure as well as having partitions, large payloads... TOAST and
> the like.
> 
> TPS can measure in the ~5-10k range.

That's cache hit rate, not coherency ;)

- Andres


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Puzzled by UNION with unknown types

2017-09-20 Thread Ken Tanzer
Hi.  I've solved my practical problems, and I know unknown types are
just bad, but am still curious about why some of these cause errors,
and others don't.

It seems contingent on whether the value is generated on the fly, but
I don't understand the mechanism or distinction.  Any help
appreciated!

-- These work:

ag_consulting=> SELECT '1'::unknown  UNION ALL SELECT '2'::unknown;

 unknown
-
 1
 2
(2 rows)


-- db_list is a table with one row:

ag_consulting=> SELECT '1'::unknown FROM db_list UNION ALL SELECT
'2'::unknown FROM db_list;
 unknown
-
 1
 2
(2 rows)


-- These don't work:


ag_consulting=> SELECT * FROM (SELECT '1'::unknown) foo UNION ALL
SELECT * FROM (SELECT '1'::unknown) bar;
ERROR:  failed to find conversion function from unknown to text


ag_consulting=> CREATE TEMP TABLE t1 AS SELECT '1'::unknown; \d t1
WARNING:  column "unknown" has type "unknown"
DETAIL:  Proceeding with relation creation anyway.
SELECT 1
 Table "pg_temp_2.t1"
 Column  |  Type   | Modifiers
-+-+---
 unknown | unknown |

ag_consulting=> SELECT * FROM t1 UNION ALL SELECT * FROM t1;
ERROR:  failed to find conversion function from unknown to text



-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://agency-software.org/demo/client
*
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Re: [SPAM] Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-20 Thread Klaus P. Pieper
> -Ursprüngliche Nachricht-
> Von: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
> ow...@postgresql.org] Im Auftrag von Moreno Andreo
> Gesendet: Mittwoch, 20. September 2017 17:42
> I may be wrong, as I don't know MS SQL Server, but in the way you describe it,
> you make me come in mind pg_basebackup().

A VSS writer just prepares a snapshot of the (virtual) machine while 
pg_basebackup make actually a (safe) copy of the cluster - important 
differences are that VSS writer 1) is called by the operating system, 2) is 
much faster and 3) does not need space for a complete backup.

But you are right: if you can ensure that pg_basebackup has finished prior to 
making a backup of the virtual machine, this is certainly safe.


Klaus


---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-20 Thread Klaus P. Pieper
> Von: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
> ow...@postgresql.org] Im Auftrag von George Neuner
>
> But VSS is needed only to copy VM files *while* they are in use.  If you 
> snapshot
> the VM, the snapshot files then are read-only and can be freely copied.  As 
> long
> as the backup avoids the currently active files, there is no danger.

But if you take a snapshot without VSS writer functions, I tend to believe that 
there is a serious potential that relevant information is still only in RAM and 
not written to disk. This would get lost in the snapshot.
I may be wrong, but my understanding of a VSS writer is that all transaction 
and log files are flushed to disk prior tot he snapshot.

Klaus


---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Any known issues Pg 9.3 on Ubuntu Xenial kernel 4.4.0?

2017-09-20 Thread Justin Pryzby
On Wed, Sep 20, 2017 at 01:14:14PM -0500, Jerry Sievers wrote:
> Be curious to hear of issues encountered and particular to eager to know
> if disabling any kernel 4.x features helped.

What was the old kernel/OS ?  wheezy / kernel 3.x ?

Perhaps try these ideas ?
https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com

Justin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Any known issues Pg 9.3 on Ubuntu Xenial kernel 4.4.0?

2017-09-20 Thread Jerry Sievers
Basically as per $subject.

We took a perf hit moving up to newer hardware and OS version which
might in some cases be OK but admittedly there is some risk running a
much older app (Pg 9.3) on a kernel/OS version that nowhere near existed
when 9.3 was current.

Be curious to hear of issues encountered and particular to eager to know
if disabling any kernel 4.x features helped.

Thanks

 PostgreSQL 9.3.19 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 
5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit


$ uname -a
Linux foo.somehost.com 4.4.0-92-generic #115-Ubuntu SMP Thu Aug 10 09:04:33 UTC 
2017 x86_64 x86_64 x86_64 GNU/Linux

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Up to date conventional wisdom re max shared_buffer size?

2017-09-20 Thread Jerry Sievers
Ron Johnson  writes:

> On 09/19/2017 05:00 PM, Jerry Sievers wrote:
> [snip]
>
>> The DB is 10TB total size with OLTP plus some occasional heavy batching
>> which frequently correlates with degradation that requires intervention.
>>
>> Unrelated server problem forced us to relocate from a Debian/Wheezy 3.x
>> kernel 1T 144 CPU to the even bigger box mentioned earlier.  And we wen
>> up a major kernel version also in the process.
>
> How did you backup/restore a 10TB db?

We just relocated the SAN volume.  Takes about 1 minute :-)

>
> -- 
> World Peace Through Nuclear Pacification

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Up to date conventional wisdom re max shared_buffer size?

2017-09-20 Thread Jerry Sievers
Thanks Andres!  See inline...

Andres Freund  writes:

> Hi,
>
> On 2017-09-19 17:00:05 -0500, Jerry Sievers wrote:
>> Briefly, just curious if legacy max values for shared_buffers have
>> scaled up since 8G was like 25% of RAM?
>
> It's very workload dependent. I've successfully used PG with roughly 1TB
> of shared buffers, where that performed better than lower
> settings.

Wow!  Ok

>
>
>> Pg 9.3 on monster 2T/192 CPU Xenial thrashing
>
> Not sure what the word "thrashing" in that sentence means.

Cases of dozens or hundreds of sessions running typical statements for
this system but running 100% on their CPUs.  Seems to be triggered by
certain heavy weight batch jobs kicking off on this generally OLTP
system.

ISTM there might be LW lock contention happening around some sort of
shared resource where the lock wait implementation is a CPU spinner.


>
> Things have improved a lot since 9.3 WRT to scalability, so I'd not
> infer too much from 9.3 performance on a larger box.

Understood.  The situation got worse when we moved to the even bigger
box also running a 4.x kernel which I presume was no where near existent
when 9.3 was our current Pg version.

>
>
>> Upgrade pending but we recently started having $interesting performance
>> issues at times looking like I/O slowness and other times apparently
>> causing CPU spins.
>
> That's not something we can really usefully comment on given the amount
> of information.

Ack'd.

I'd like to strace some of the spinning backends when/if we get another
opportunity to observe the problem to see if by syscall or libfunc name
we can learn more about what's the cause.

>
>> Anyway, shared_buffer coherency generally high but does take big dips
>> that are sometimes sustained for seconds or even minutes.
>
> "shared_buffer coherency"?

As measured querying pg_stat_databases and comparing total reads to read
hits.  Run frequently such as once /5-seconds and factored into a hit
percentage.  May stay up around 100% for several ticks but then go way
down which may or not sustain.

This is an OLTP app using Rails with hundreds of tables both trivial
n structure as well as having partitions, large payloads... TOAST and
the like.

TPS can measure in the ~5-10k range.

Thx again

>
>
> Greetings,
>
> Andres Freund

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] libpq confusion

2017-09-20 Thread Igor Korot
Thx.
So it is referring to the command not a "command returning no data". ;-)

On Wed, Sep 20, 2017 at 1:42 PM, John R Pierce  wrote:
> On 9/20/2017 10:34 AM, Igor Korot wrote:
>
> >From the documentation:
> https://www.postgresql.org/docs/9.1/static/libpq-exec.html
>
> [quote]
> PGRES_COMMAND_OK
>
> Successful completion of a command returning no data.
> [/quote]
>
> No data = no rows, right?
>
> from that same page, a bit farther down, clarifying the potentially
> confusing wording.
>
> If the result status is PGRES_TUPLES_OK, then the functions described below
> can be used to retrieve the rows returned by the query. Note that a SELECT
> command that happens to retrieve zero rows still shows PGRES_TUPLES_OK.
> PGRES_COMMAND_OK is for commands that can never return rows (INSERT, UPDATE,
> etc.). A response of PGRES_EMPTY_QUERY might indicate a bug in the client
> software.
>
>
> --
> john r pierce, recycling bits in santa cruz


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] libpq confusion

2017-09-20 Thread John R Pierce

On 9/20/2017 10:34 AM, Igor Korot wrote:

>From the documentation:
https://www.postgresql.org/docs/9.1/static/libpq-exec.html

[quote]
PGRES_COMMAND_OK

Successful completion of a command returning no data.
[/quote]

No data = no rows, right?


from that same page, a bit farther down, clarifying the potentially 
confusing wording.


   If the result status isPGRES_TUPLES_OK, then the functions described
   below can be used to retrieve the rows returned by the query. Note
   that aSELECTcommand that happens to retrieve zero rows still
   showsPGRES_TUPLES_OK.PGRES_COMMAND_OKis for commands that can never
   return rows (INSERT,UPDATE, etc.). A response
   ofPGRES_EMPTY_QUERYmight indicate a bug in the client software.


--
john r pierce, recycling bits in santa cruz



Re: [GENERAL] libpq confusion

2017-09-20 Thread Igor Korot
Hi, John,

On Wed, Sep 20, 2017 at 12:02 PM, John R Pierce  wrote:
> On 9/20/2017 6:30 AM, Igor Korot wrote:
>
> Hi, guys,
>
> On Wed, Sep 20, 2017 at 12:20 AM, Allan Harvey
>  wrote:
>
> How do I properly check if the record exists from libpq?
>
> Igor,
> I use PQntuples() to check the number of ... tuples, for > 0
>
> I was actually curious - isn't it what "PGRES_COMMAND_OK" for?
> IIUC, this constant indicates successful query run, but no records was
> generated.
>
> Or am I missing something and I will have to check PQntuples()?
>
>
> a query that returns zero rows is still successful.

>From the documentation:
https://www.postgresql.org/docs/9.1/static/libpq-exec.html

[quote]
PGRES_COMMAND_OK

Successful completion of a command returning no data.
[/quote]

No data = no rows, right?

Thank you.


>
> --
> john r pierce, recycling bits in santa cruz


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-20 Thread George Neuner
On Wed, 20 Sep 2017 17:15:36 +0200, "Klaus P. Pieper"
 wrote:

>> -Ursprüngliche Nachricht-
>> Von: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
>> ow...@postgresql.org] Im Auftrag von Thomas Güttler
>> Gesendet: Mittwoch, 20. September 2017 10:03
>> An: pgsql-general@postgresql.org
>> Betreff: [GENERAL] VM-Ware Backup of VM safe?
>>
>> We run a PostgreSQL 9.6 server in a virtual machine.
>>
>> The virtual machine is managed by the customer.
>>
>> He does backup the VM.
>>
>> Is this enough, is this safe?
>
>When you run MS SQL Server on Hyper-V / Windows, the SQL Server
>provides ist own VSS writer responding to a taking a snapshot.
>This ensures that the image oft he SQL database is in a safe 
>stake when the backup is written.
>
>I am not sure about VM-Ware, but in any case I don't think that
>PostgreSQL provides anything similar to a VSS writer
>(neither on Windows nor on Linux), so the answer is most likely
>"no, it is not safe".

Vmware does not provide VSS support in the base product - it is
available if you install the Data Recovery extensions.

FWIW, Virtualbox doesn't provide VSS support either.


But VSS is needed only to copy VM files *while* they are in use.  If
you snapshot the VM, the snapshot files then are read-only and can be
freely copied.  As long as the backup avoids the currently active
files, there is no danger.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [HACKERS] USER Profiles for PostgreSQL

2017-09-20 Thread Stephen Frost
Greetings John,

* John R Pierce (pie...@hogranch.com) wrote:
> On 9/20/2017 6:55 AM, Stephen Frost wrote:
> >If AD is in the mix here, then there's no need to have things happening
> >at the database level when it comes to passwords- configure PG to use
> >Kerberos and create a princ in AD and put that on the database server
> >and then users can authenticate that way.
> 
> for web apps?   how does a web browser do kerberos over http ?

Yes, and using SPNEGO, which the major browsers have all supported for a
very long time.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [GENERAL] [HACKERS] USER Profiles for PostgreSQL

2017-09-20 Thread John R Pierce

On 9/20/2017 6:55 AM, Stephen Frost wrote:

If AD is in the mix here, then there's no need to have things happening
at the database level when it comes to passwords- configure PG to use
Kerberos and create a princ in AD and put that on the database server
and then users can authenticate that way.



for web apps?   how does a web browser do kerberos over http ?


--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] libpq confusion

2017-09-20 Thread John R Pierce

On 9/20/2017 6:30 AM, Igor Korot wrote:

Hi, guys,

On Wed, Sep 20, 2017 at 12:20 AM, Allan Harvey
  wrote:

How do I properly check if the record exists from libpq?

Igor,
I use PQntuples() to check the number of ... tuples, for > 0

I was actually curious - isn't it what "PGRES_COMMAND_OK" for?
IIUC, this constant indicates successful query run, but no records was
generated.

Or am I missing something and I will have to check PQntuples()?



a query that returns zero rows is still successful.

--
john r pierce, recycling bits in santa cruz



Re: [GENERAL] Insert large number of records

2017-09-20 Thread Alban Hertroys
On 20 September 2017 at 07:42, Job  wrote:
> We use a "temporary" table, populated by pg_bulkload - it takes few minutes 
> in this first step.
> Then, from the temporary table, datas are transferred by a trigger that copy 
> the record into the production table.
> But *this step* takes really lots of time (sometimes also few hours).
> There are about 10 millions of record.

Perhaps the problem isn't entirely on the writing end of the process.

How often does this trigger fire? Once per row inserted into the
"temporary" table, once per statement or only after the bulkload has
finished?

Do you have appropriate indices on the temporary table to guarantee
quick lookup of the records that need to be copied to the target
table(s)?

> We cannot use pg_bulkload to load directly data into production table since 
> pg_bulkload would lock the Whole table, and "COPY" command is slow and would 
> not care about table partitioning (COPY command fire partitioned-table 
> triggers).

As David already said, inserting directly into the appropriate
partition is certainly going to be faster. It removes a check on your
partitioning conditions from the query execution plan; if you have
many partitions, that adds up, because the database needs to check
that condition among all your partitions for every row.

Come to think of it, I was assuming that the DB would stop checking
other partitions once it found a suitable candidate, but now I'm not
so sure it would. There may be good reasons not to stop, for example
if we can partition further into sub-partitions. Anybody?


Since you're already using a trigger, it would probably be more
efficient to query your "temporary" table for batches belonging to the
same partition and insert those into the partition directly, one
partition at a time.

Even better would be if your bulkload could already be organised such
that all the data in the "temporary" table can indiscriminately be
inserted into the same target partition. That though depends a bit on
your setup - at some point the time saved at one end gets consumed on
the other or it takes even longer there.

Well, I think I've thrown enough ideas around for now ;)

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [SPAM] Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-20 Thread Moreno Andreo

Il 20/09/2017 17:15, Klaus P. Pieper ha scritto:

-Ursprüngliche Nachricht-
Von: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
ow...@postgresql.org] Im Auftrag von Thomas Güttler
Gesendet: Mittwoch, 20. September 2017 10:03
An: pgsql-general@postgresql.org
Betreff: [GENERAL] VM-Ware Backup of VM safe?

We run a PostgreSQL 9.6 server in a virtual machine.

The virtual machine is managed by the customer.

He does backup the VM.

Is this enough, is this safe?

When you run MS SQL Server on Hyper-V / Windows, the SQL Server provides ist 
own VSS writer responding to a taking a snapshot. This ensures that the image 
oft he SQL database is in a safe stake when the backup is written.
I may be wrong, as I don't know MS SQL Server, but in the way you 
describe it, you make me come in mind pg_basebackup().





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-20 Thread Klaus P. Pieper
> -Ursprüngliche Nachricht-
> Von: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
> ow...@postgresql.org] Im Auftrag von Thomas Güttler
> Gesendet: Mittwoch, 20. September 2017 10:03
> An: pgsql-general@postgresql.org
> Betreff: [GENERAL] VM-Ware Backup of VM safe?
>
> We run a PostgreSQL 9.6 server in a virtual machine.
>
> The virtual machine is managed by the customer.
>
> He does backup the VM.
>
> Is this enough, is this safe?

When you run MS SQL Server on Hyper-V / Windows, the SQL Server provides ist 
own VSS writer responding to a taking a snapshot. This ensures that the image 
oft he SQL database is in a safe stake when the backup is written.

I am not sure about VM-Ware, but in any case I don't think that PostgreSQL 
provides anything similar to a VSS writer (neither on Windows nor on Linux), so 
the answer is most likely "no, it is not safe".

Regards, Klaus


---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Insert large number of records

2017-09-20 Thread David G. Johnston
On Tuesday, September 19, 2017, Job  wrote:

> and would not care about table partitioning (COPY command fire
> partitioned-table triggers).


You might want to write a script that inserts directly into the partitions
and bypass routing altogether.

Insert into ... select from ... is your only option for table-to-table and
you are stuck with whatever locks the execution of the command needs to
take.

David J.


Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-20 Thread George Neuner
On Wed, 20 Sep 2017 10:03:15 +0200, Thomas Güttler
 wrote:

>We run a PostgreSQL 9.6 server in a virtual machine.
>The virtual machine is managed by the customer.
>He does backup the VM.
>
>Is this enough, is this safe?

It is "safe" if the VM is shut down first or if the backup is by
copying a point-in-time snapshot of the VM.

Whether it is "enough" is a matter of opinion.  Backing up a VM
basically is the equivalent of imaging a hard drive: it's great if the
filesystem is intact and the image is clean ... otherwise it's not so
good.

If the database files on the VM's "drive" develop errors, those errors
will be preserved in the VM backup. 


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] libpq confusion

2017-09-20 Thread Allan Harvey

>How do I properly check if the record exists from libpq?

Igor,
I use PQntuples() to check the number of ... tuples, for > 0

Allan

__
This e-mail message may contain confidential or legally privileged information 
and is only for the use of the intended recipient(s). Any unauthorized 
disclosure, dissemination, distribution, copying or the taking of any action in 
reliance on the information herein is prohibited. E-mails are not secure and 
cannot be guaranteed to be error free as they can be intercepted, amended, or 
contain viruses. Anyone who communicates with us by e-mail is deemed to have 
accepted these risks. GFG Alliance Australia and its related bodies corporate 
and the sender are not responsible for errors or omissions in this message and 
deny any responsibility for any damage arising from the use of this e-mail. Any 
opinion and other statement contained in this message and any attachment are 
solely those of the author and do not necessarily represent those of the 
company. All and any rights as to confidentiality, legal professional privilege 
and copyright are expressly reserved.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Inserting millions of record in a partitioned Table

2017-09-20 Thread Job
Hi guys,

with Postgresql 9.6.1 we need to insert, from a certain select query, some 
millions of record in a partitioned table.
The table is partitioned by day.
The datas we will import can, often, be mixed between two different days.

We noticed that if we import directly into the global table it is really, 
really slow.
Importing directly in the single partition is faster.

Any suggestions here please?

Thank you, best!
F

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [HACKERS] USER Profiles for PostgreSQL

2017-09-20 Thread Stephen Frost
John,

* John R Pierce (pie...@hogranch.com) wrote:
> On 9/19/2017 3:32 PM, chiru r wrote:
> >How those application accounts get recognized in database?
> >
> >Let say  App_user1 authenticated through application ,after that
> >how the App_user1 get access to DB?
> >
> >can you please provide more information ,how the app users are
> >accessing database ?
> 
> the user isn't accessing the database, the application itself is
> accessing the database.   a web app might have 10s of 1000s of
> unique users, but a web app typically only uses a single application
> account to access the database.

While that's one approach, it's certainly not the only one (and not
necessairly considered a good approach either..).

If AD is in the mix here, then there's no need to have things happening
at the database level when it comes to passwords- configure PG to use
Kerberos and create a princ in AD and put that on the database server
and then users can authenticate that way.

Further, applications can be configured to accept and proxy Kerberos
credentials from the user to authenticate with to the database, allowing
the application to actually be logged in as the user for that session,
with only those rights the user has.

Group/Role membership does still have to be sync'd between the AD/LDAP
directory and the database for those permissions to be set up, but that
isn't all that hard to do using a cronjob and ldap_fdw, or similar,
people just have to realize that there's a bit of lag.  The same goes
for creating accounts in the first place in the database, of course.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [GENERAL] libpq confusion

2017-09-20 Thread Igor Korot
Hi, guys,

On Wed, Sep 20, 2017 at 12:20 AM, Allan Harvey
 wrote:
>
>>How do I properly check if the record exists from libpq?
>
> Igor,
> I use PQntuples() to check the number of ... tuples, for > 0

I was actually curious - isn't it what "PGRES_COMMAND_OK" for?
IIUC, this constant indicates successful query run, but no records was
generated.

Or am I missing something and I will have to check PQntuples()?

Thank you.

>
> Allan
>
> __
> This e-mail message may contain confidential or legally privileged 
> information and is only for the use of the intended recipient(s). Any 
> unauthorized disclosure, dissemination, distribution, copying or the taking 
> of any action in reliance on the information herein is prohibited. E-mails 
> are not secure and cannot be guaranteed to be error free as they can be 
> intercepted, amended, or contain viruses. Anyone who communicates with us by 
> e-mail is deemed to have accepted these risks. GFG Alliance Australia and its 
> related bodies corporate and the sender are not responsible for errors or 
> omissions in this message and deny any responsibility for any damage arising 
> from the use of this e-mail. Any opinion and other statement contained in 
> this message and any attachment are solely those of the author and do not 
> necessarily represent those of the company. All and any rights as to 
> confidentiality, legal professional privilege and copyright are expressly 
> reserved.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JDBC: logical replication and LSN feedback

2017-09-20 Thread Yason TR
Thanks, I did not realize that acknowledgements are delayed until the next forceUpdateStatus() call. From the javadocs, I understood that the a read() call also does a forceUpdateStatus() internally, so a manual forceUpdateStatus() is not required. Maybe a forceUpdateStatus() only happens during a read() when the read() effectively received some messages?

 

I will start my experiments again and let you know.

 

Kind regards,

 

Yason TR

 

Sent: Wednesday, September 20, 2017 at 2:22 PM
From: "Dave Cramer" 
To: "Yason TR" 
Cc: "Achilleas Mantzios" , "pgsql-general@postgresql.org" , List 
Subject: Re: [GENERAL] JDBC: logical replication and LSN feedback



+list

 
First off you are going to get considerably better response from the JDBC list or our github project.

 

Looking at the code; in order to ensure the backend has received the acknowledgement you need to call forceUpdateStatus

 

Otherwise it may not receive the ack

 

 

 
 

 

 

 



 



Dave Cramer

da...@postgresintl.com

www.postgresintl.com



 

On 19 September 2017 at 07:53, Yason TR  wrote:




Should we read "In the event that replication has been restarted, it's will start from last successfully processed LSN that was sent via feedback to database." that this last succesfully event will be included (again) after a restart of the replication, or that the next event starting from the this last successfully event will be sent?

 

I would expect the second, as this makes the most sense (because the consumers only want each event once), but I am not sure.

 

Thanks a lot and kind regards,

 

Yason TR

 

Sent: Tuesday, September 19, 2017 at 4:14 PM
From: "Achilleas Mantzios" 
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] JDBC: logical replication and LSN feedback



On 19/09/2017 16:37, Yason TR wrote:
> Hi all,
>
> I am developing an application which connects to a logical replication slot, to consume the WAL events. These WAL events are then forwarded to a MQ broker.
>
> The heart of the code can be seen as:
>
> while (true) {
> Connection connection = null;
> PGReplicationStream stream = null;
>
> try {
> connection = DriverManager.getConnection("jdbc:postgresql://localhost:5432/db", properties);
> stream = connection.unwrap(PGConnection.class).getReplicationAPI().replicationStream().logical().withSlotName("slot").start();
>
> while (true) {
> final ByteBuffer buffer = stream.read();
>
> // ... MQ logic here ... omitted ...
>
> stream.setAppliedLSN(stream.getLastReceiveLSN());
> stream.setFlushedLSN(stream.getLastReceiveLSN());
> }
> } catch (final SQLException e) {
> // ... log exception ... omitted ...
> } finally {
> // ... close stream and connection ... omitted ...
> }
> }
>
> I notice some behavior which I cannot explain and would like to understand so I can alter my code:
>
> - When I restart the application, I notice that the application is retrieving the last event from the previous run again. The result is that this event is sent twice to the MQ broker after a restart of the application. Why is that? Isn't calling `setAppliedLSN(stream.getLastReceiveLSN())` and/or `setFlushedLSN(stream.getLastReceiveLSN())` enough to acknowledge an event, so it will removed from the WAL log and it will not be resent?
>
> - When receiving an event, the corresponding LSN from that event (which is sent in the payload) is not the same as the result of `stream.getLastReceivedLSN()`. Why is that? Which one should I use? Maybe this is correlated to my first question.
>
> - What is the difference between `setAppliedLSN(LSN)` and `setFlushedLSN(LSN)`? The Javadocs are not really helpful here.

The stages of a wal location generally go like : sent -> write -> flush -> replay , at least in terms of physical replication.
I guess applied=replayed ?

Note that from the docs : https://jdbc.postgresql.org/documentation/head/replication.html#logical-replication
it says :
"
In the event that replication has been restarted, it's will start from last successfully processed LSN that was sent via feedback to database.
"

>
> FYI, I also asked this question on https://stackoverflow.com/questions/46301578/postgres-jdbc-logical-replication-lsn-feedback.
>
> Thanks a lot and kind regards,
>
> Yason TR
>
>

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general















Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-20 Thread Moreno Andreo

Il 20/09/2017 13:36, PT ha scritto:

On Wed, 20 Sep 2017 10:03:15 +0200
Thomas Güttler  wrote:


We run a PostgreSQL 9.6 server in a virtual machine.

The virtual machine is managed by the customer.

He does backup the VM.

Is this enough, is this safe?

There are so many variables involved with doing that ... I don't think
anyone can reliably answer that question.

I recommend you put together a periodic test schedule where you restore
a machine from the backup and ensure everything works. To be honest, you
should be doing that anyway.

In addition, can you afford to lose data? You take a snapshot, say, 
nightly at 3am, if you have an incident at 6pm and nothing is 
recoverable, you need to recover from backup and doing so you lose 15 
hours of data updates. If you can't afford that, then you should 
implement something like WAL shipping (with tools like barman) that, 
given a backup made via pg_basebackup(), it's able to rewind all WAL 
files and bring you in the last state (that's Point-In-Time Recovery 
feature).





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] random row from a subset

2017-09-20 Thread Tomas Vondra


On 09/20/2017 01:28 PM, bluefrog wrote:
> 
> thanks, interestingly your method works in both Oracle and PostgreSQL,
> albeit with a different random function call.
> It does not work in SQL Anywhere though.
> 

You will have to ask SQL Anywhere people, I guess.

cheers

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JDBC: logical replication and LSN feedback

2017-09-20 Thread Dave Cramer
+list

First off you are going to get considerably better response from the JDBC
list or our github project.

Looking at the code; in order to ensure the backend has received the
acknowledgement you need to call forceUpdateStatus

Otherwise it may not receive the ack








Dave Cramer

da...@postgresintl.com
www.postgresintl.com

On 19 September 2017 at 07:53, Yason TR  wrote:

> Should we read "In the event that replication has been restarted, it's
> will start from last successfully processed LSN that was sent via feedback
> to database." that this last succesfully event will be included (again)
> after a restart of the replication, or that the next event starting from
> the this last successfully event will be sent?
>
> I would expect the second, as this makes the most sense (because the
> consumers only want each event once), but I am not sure.
>
> Thanks a lot and kind regards,
>
> Yason TR
>
> *Sent:* Tuesday, September 19, 2017 at 4:14 PM
> *From:* "Achilleas Mantzios" 
> *To:* pgsql-general@postgresql.org
> *Subject:* Re: [GENERAL] JDBC: logical replication and LSN feedback
> On 19/09/2017 16:37, Yason TR wrote:
> > Hi all,
> >
> > I am developing an application which connects to a logical replication
> slot, to consume the WAL events. These WAL events are then forwarded to a
> MQ broker.
> >
> > The heart of the code can be seen as:
> >
> > while (true) {
> > Connection connection = null;
> > PGReplicationStream stream = null;
> >
> > try {
> > connection = 
> > DriverManager.getConnection("jdbc:postgresql://localhost:5432/db",
> properties);
> > stream = connection.unwrap(PGConnection.class).getReplicationAPI().
> replicationStream().logical().withSlotName("slot").start();
> >
> > while (true) {
> > final ByteBuffer buffer = stream.read();
> >
> > // ... MQ logic here ... omitted ...
> >
> > stream.setAppliedLSN(stream.getLastReceiveLSN());
> > stream.setFlushedLSN(stream.getLastReceiveLSN());
> > }
> > } catch (final SQLException e) {
> > // ... log exception ... omitted ...
> > } finally {
> > // ... close stream and connection ... omitted ...
> > }
> > }
> >
> > I notice some behavior which I cannot explain and would like to
> understand so I can alter my code:
> >
> > - When I restart the application, I notice that the application is
> retrieving the last event from the previous run again. The result is that
> this event is sent twice to the MQ broker after a restart of the
> application. Why is that? Isn't calling 
> `setAppliedLSN(stream.getLastReceiveLSN())`
> and/or `setFlushedLSN(stream.getLastReceiveLSN())` enough to acknowledge
> an event, so it will removed from the WAL log and it will not be resent?
> >
> > - When receiving an event, the corresponding LSN from that event (which
> is sent in the payload) is not the same as the result of
> `stream.getLastReceivedLSN()`. Why is that? Which one should I use? Maybe
> this is correlated to my first question.
> >
> > - What is the difference between `setAppliedLSN(LSN)` and
> `setFlushedLSN(LSN)`? The Javadocs are not really helpful here.
>
> The stages of a wal location generally go like : sent -> write -> flush ->
> replay , at least in terms of physical replication.
> I guess applied=replayed ?
>
> Note that from the docs : https://jdbc.postgresql.org/documentation/head/
> replication.html#logical-replication
> it says :
> "
> In the event that replication has been restarted, it's will start from
> last successfully processed LSN that was sent via feedback to database.
> "
>
> >
> > FYI, I also asked this question on https://stackoverflow.com/
> questions/46301578/postgres-jdbc-logical-replication-lsn-feedback.
> >
> > Thanks a lot and kind regards,
> >
> > Yason TR
> >
> >
>
> --
> Achilleas Mantzios
> IT DEV Lead
> IT DEPT
> Dynacom Tankers Mgmt
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] [HACKERS] USER Profiles for PostgreSQL

2017-09-20 Thread Stephen Frost
Bruce,

* Bruce Momjian (br...@momjian.us) wrote:
> On Tue, Sep 19, 2017 at 01:28:11PM -0400, Stephen Frost wrote:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > > chiru r  writes:
> > > > We are looking  for User profiles in ope source PostgreSQL.
> > > > For example, If a  user password failed n+ times while login ,the user
> > > > access has to be blocked few seconds.
> > > > Please let us know, is there any plan to implement user profiles in 
> > > > feature
> > > > releases?.
> > > 
> > > Not particularly.  You can do that sort of thing already via PAM,
> > > for example.
> > 
> > Ugh, hardly and it's hokey and a huge pain to do, and only works on
> > platforms that have PAM.
> > 
> > Better is to use an external authentication system (Kerberos, for
> > example) which can deal with this, but I do think this is also something
> > we should be considering for core, especially now that we've got a
> > reasonable password-based authentication method with SCRAM.
> 
> Does LDAP do this too?

Active Directory does this, with Kerberos as the authentication
mechanism.  Straight LDAP might also support it, but I wouldn't
recommend it because it's really insecure as the PG server will see the
user's password in the cleartext (and it may be sent in cleartext across
the network too unless careful steps are taken to make sure that the
client only ever connects over SSL to a known trusted and verified
server).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-20 Thread David Steele
On 9/20/17 7:36 AM, PT wrote:
> On Wed, 20 Sep 2017 10:03:15 +0200
> Thomas Güttler  wrote:
> 
>> We run a PostgreSQL 9.6 server in a virtual machine.
>>
>> The virtual machine is managed by the customer.
>>
>> He does backup the VM.
>>
>> Is this enough, is this safe?
> 
> There are so many variables involved with doing that ... I don't think
> anyone can reliably answer that question.
> 
> I recommend you put together a periodic test schedule where you restore
> a machine from the backup and ensure everything works. To be honest, you
> should be doing that anyway.

Restore testing is as must, but a bad backup scheme can result in subtle
errors that are very hard to detect.

If you can't find specific documentation that your VM backup solution is
safe to use with a DBMS then it is almost certainly not safe.  Even if
it says it is there are potential gotchas.  For example, the backup may
not be consistent if you are using multiple volumes.

In addition, data loss on restore will be greater if there is no WAL
archive to play forward from.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Selecting a daily puzzle record - which type of column to add?

2017-09-20 Thread Alexander Farber
Hello, I appreciate your comments, thank you


Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-20 Thread PT
On Wed, 20 Sep 2017 10:03:15 +0200
Thomas Güttler  wrote:

> We run a PostgreSQL 9.6 server in a virtual machine.
> 
> The virtual machine is managed by the customer.
> 
> He does backup the VM.
> 
> Is this enough, is this safe?

There are so many variables involved with doing that ... I don't think
anyone can reliably answer that question.

I recommend you put together a periodic test schedule where you restore
a machine from the backup and ensure everything works. To be honest, you
should be doing that anyway.

-- 
PT 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] random row from a subset

2017-09-20 Thread bluefrog

thanks, interestingly your method works in both Oracle and PostgreSQL,
albeit with a different random function call.
It does not work in SQL Anywhere though.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Logcal replication in large scale

2017-09-20 Thread Magnus Hagander
On Wed, Sep 20, 2017 at 10:02 AM, Nicklas Avén 
wrote:

> On Wed, 2017-09-20 at 09:44 +0200, Magnus Hagander wrote:
>
> On Wed, Sep 20, 2017 at 8:53 AM, Nicklas Avén 
> wrote:
>
> Hallo all
>
> I am thrilled about logical replication in PostgreSQL 10. My head have
> started spinning about use cases.
>
> Would it be possible to use logical replication as a distribution
> method of data?
>
>
> As an answer to the generic  question: yes :)
>
>
>
> I think about map data from national mapping authorities. The problem
> is to get the updates of their data sets. Especially the open data sets
> are now distributed as files (shape files) in Sweden and as pg_dump in
> Norway.
>
> I guess it is beyond what logical replication is designed for, so I ask
> what problems that might arise for a scenario like:
>
> The distributor has a publication database with logical replication
> publications of the tables. All users, probably thousands or more,
> would subscribe to that publication to get an updated copy of the data
> set.
>
> How would the publication server react? I guess the WAL-files will only
> be written once anyway?
>
>
> Yes. But they will  need to be kept around until *all* subscribers have
> pulled down their changes. So even one subscriber falling behind will mean
> your WAL will never get cleaned up.
>
> Of course, you can keep some sort of watcher process that kills old
> replication slots.
>
> I am also not sure how well PostgreSQL will react to having thousands of
> replication slots. It's not what the system was designed for I believe :)
>
>
> Ok, I have to read me up on how this works. I thought about it as a bucket
> of WAL-files that the subscribers just "grab".
> But of course there is some bookkeeping to make things work.
>

No, it's streaming replication.

And in the end, that bucket becomes infinitely large.


>
> I guess there is also mechanisms so a new subscriber can get a complete
> table after the publcation have been active for a long time? I mean the
> "cleaning up" of Wal-files will not leave new subscribers missing what is
> written to the table long ago?
>

Yes, new subscriptions will get the current version of the data and only
then start buffering changes.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-20 Thread Albe Laurenz
Thomas Güttler wrote:
> We run a PostgreSQL 9.6 server in a virtual machine.
> 
> The virtual machine is managed by the customer.
> 
> He does backup the VM.
> 
> Is this enough, is this safe?

I don't know about VMware, but the general rule is that
if the backup is truly atomic (it is guaranteed to capture
a consistent state of the file system), you can use it
to backup the database.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] random row from a subset

2017-09-20 Thread Tomas Vondra


On 09/20/2017 02:33 AM, Peter Koukoulis wrote:
> 
> I'm attempting to get a random, based on a range that spans 1 to the
> maximum number of rows that for a subset.
> I run the query in Oracle sucessfully and get a different number each
> time and only a single number, which is what I am expecting,
> 
> but when I run the same query, albeit the random function is different,
> I either observe no result, a single row or two rows, 
> for example:
> 
> ft_node=# select c_id
> from    (
>          select c_id, row_number() over (order by c_d_id) as rn
>               ,  count(*) over() max_rn
>          from customer where c_d_id=5
>         ) t
> where rn = (select floor(random()*(max_rn))+1);

The problem here is that random() is volatile, so it's executed for each
row. So essentially if the subselect has 100 rows, you'll get 100 random
values. So you're "rolling the dice" for every row independently.
Sometimes one row matches, sometime none, sometime more than one.

You need to do either this:

with rand as (select random() as r)
select c_id
from(
 select c_id, row_number() over (order by c_d_id) as rn
  ,  count(*) over() max_rn
 from customer where c_d_id=5
) t
where rn = (select floor(r*(max_rn))+1 from rand);

or define an immutable wrapper for random():

CREATE FUNCTION random_stable() RETURNS DOUBLE PRECISION
AS 'SELECT random()'
LANGUAGE SQL
IMMUTABLE;

and use that instead.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Logcal replication in large scale

2017-09-20 Thread Nicklas Avén
On Wed, 2017-09-20 at 09:44 +0200, Magnus Hagander wrote:
> On Wed, Sep 20, 2017 at 8:53 AM, Nicklas Avén  g.no> wrote:
> > Hallo all
> > 
> > 
> > 
> > I am thrilled about logical replication in PostgreSQL 10. My head
> > have
> > 
> > started spinning about use cases.
> > 
> > 
> > 
> > Would it be possible to use logical replication as a distribution
> > 
> > method of data?
> 
> As an answer to the generic  question: yes :)
> 
>  
> > I think about map data from national mapping authorities. The
> > problem
> > 
> > is to get the updates of their data sets. Especially the open data
> > sets
> > 
> > are now distributed as files (shape files) in Sweden and as pg_dump
> > in
> > 
> > Norway.
> > 
> > 
> > 
> > I guess it is beyond what logical replication is designed for, so I
> > ask
> > 
> > what problems that might arise for a scenario like:
> > 
> > 
> > 
> > The distributor has a publication database with logical replication
> > 
> > publications of the tables. All users, probably thousands or more,
> > 
> > would subscribe to that publication to get an updated copy of the
> > data
> > 
> > set.
> > 
> > 
> > 
> > How would the publication server react? I guess the WAL-files will
> > only
> > 
> > be written once anyway?
> 
> Yes. But they will  need to be kept around until *all* subscribers
> have pulled down their changes. So even one subscriber falling behind
> will mean your WAL will never get cleaned up.
> 
> Of course, you can keep some sort of watcher process that kills old
> replication slots.
> 
> I am also not sure how well PostgreSQL will react to having thousands
> of replication slots. It's not what the system was designed for I
> believe :)
> 
Ok, I have to read me up on how this works. I thought about it as a
bucket of WAL-files that the subscribers just "grab".But of course
there is some bookkeeping to make things work.
I guess there is also mechanisms so a new subscriber can get a complete
table after the publcation have been active for a long time? I mean the
"cleaning up" of Wal-files will not leave new subscribers missing what
is written to the table long ago?

> You might be better of using logical decoding (which goes back to
> 9.4) to stream the data out, but not having each subscriber be a
> postgresql subscriber. Either using it to generate some sort of "diff
> files" that can then be consumed downstream, or by distributing it
> via some kind of dedicated queuing system designed to handle that
> many downstreams.

The thing is that I am not involved in any of those organisations, just
a user that want to give them some good reasons to consider
deistributing through PostgreSQL :-)
>  
> > My guess is that it will be a lower payload than today anyway when
> > the
> > 
> > whole data set have to be fetched to get updates.
> 
> As always that's going to depend on the number of updates. If the
> same thing is updated 1000 times, then if you use logical replication
> it gets transferred 1000 times. So there are definitely cases when
> there will be *more* traffic with logical replication, but in cases
> like yours I would guess it will be less.
> 
> //Magnus
> 
> 
> 

[GENERAL] VM-Ware Backup of VM safe?

2017-09-20 Thread Thomas Güttler

We run a PostgreSQL 9.6 server in a virtual machine.

The virtual machine is managed by the customer.

He does backup the VM.

Is this enough, is this safe?

Regards,
  Thomas Güttler


--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Final pg_dumpall should happen in Single-User-Mode

2017-09-20 Thread Thomas Güttler

Just for the records, I asked here the same question and got some replies:

https://dba.stackexchange.com/questions/186045/how-do-i-prevent-changes-to-my-database

Am 11.09.2017 um 15:25 schrieb Thomas Güttler:

I did a stupid mistake. I run the final pg_dumpall without switching to 
single-user-mode first.

With "final" I mean that the dump should be transferred to a different machine. 
After the dump
the old machine was shut down forever.

My mistake was that I stopped the services, but one process was still alive and 
this process
inserted to the database after pg_dumpall has started.

I guess I am not the first one who did this stupid mistake.

What do you think? Wouldn't it be polite to add a note about this topic to the 
docs?

Here are the docs for pg_dumpall: 
https://www.postgresql.org/docs/current/static/app-pg-dumpall.html

How could the wording look like? I am not a native speaker 

Or do you have a better/different idea?

Regards,
   Thomas Güttler




--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Logcal replication in large scale

2017-09-20 Thread Magnus Hagander
On Wed, Sep 20, 2017 at 8:53 AM, Nicklas Avén 
wrote:

> Hallo all
>
> I am thrilled about logical replication in PostgreSQL 10. My head have
> started spinning about use cases.
>
> Would it be possible to use logical replication as a distribution
> method of data?
>

As an answer to the generic  question: yes :)



> I think about map data from national mapping authorities. The problem
> is to get the updates of their data sets. Especially the open data sets
> are now distributed as files (shape files) in Sweden and as pg_dump in
> Norway.
>
> I guess it is beyond what logical replication is designed for, so I ask
> what problems that might arise for a scenario like:
>
> The distributor has a publication database with logical replication
> publications of the tables. All users, probably thousands or more,
> would subscribe to that publication to get an updated copy of the data
> set.
>
> How would the publication server react? I guess the WAL-files will only
> be written once anyway?
>

Yes. But they will  need to be kept around until *all* subscribers have
pulled down their changes. So even one subscriber falling behind will mean
your WAL will never get cleaned up.

Of course, you can keep some sort of watcher process that kills old
replication slots.

I am also not sure how well PostgreSQL will react to having thousands of
replication slots. It's not what the system was designed for I believe :)

You might be better of using logical decoding (which goes back to 9.4) to
stream the data out, but not having each subscriber be a postgresql
subscriber. Either using it to generate some sort of "diff files" that can
then be consumed downstream, or by distributing it via some kind of
dedicated queuing system designed to handle that many downstreams.



> My guess is that it will be a lower payload than today anyway when the
> whole data set have to be fetched to get updates.
>

As always that's going to depend on the number of updates. If the same
thing is updated 1000 times, then if you use logical replication it gets
transferred 1000 times. So there are definitely cases when there will be
*more* traffic with logical replication, but in cases like yours I would
guess it will be less.

//Magnus


Re: [GENERAL] Logical decoding client has the power to crash the server

2017-09-20 Thread Michael Paquier
On Wed, Sep 20, 2017 at 3:14 PM, Meel Velliste  wrote:
> From what I understand about logical decoding, there is no limit to how many
> log entries will be retained by the server if nobody reads them from the
> logical slot. This means that a client that fails to read from the slot has
> the power to bring down the master database because the server's disk will
> get full at which point all subsequent write operations will fail and even
> read operations will fail because they too need temporary space. Even the
> underlying operating system may be affected as it too may need temporary
> disk space to carry out its basic functions.

Monitoring is a mandatory part of the handling of replication slots.
One possible solution is to use a background worker that scans slots
causing bloat in pg_xlog and to automatically get rid of them so as
the primary is preserved from any crash. Note that advancing a slot is
doable for a physical slot, but advancing a logical slot is trickier
(not sure if that's doable actually but Andres can comment on that)
because it involves being sure that the catalog_xmin is still
preserved so as past logical changes can be looked at consistently.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Logcal replication in large scale

2017-09-20 Thread Nicklas Avén
Hallo all

I am thrilled about logical replication in PostgreSQL 10. My head have
started spinning about use cases.

Would it be possible to use logical replication as a distribution
method of data?

I think about map data from national mapping authorities. The problem
is to get the updates of their data sets. Especially the open data sets
are now distributed as files (shape files) in Sweden and as pg_dump in
Norway.

I guess it is beyond what logical replication is designed for, so I ask
what problems that might arise for a scenario like:

The distributor has a publication database with logical replication
publications of the tables. All users, probably thousands or more,
would subscribe to that publication to get an updated copy of the data
set.

How would the publication server react? I guess the WAL-files will only
be written once anyway?

My guess is that it will be a lower payload than today anyway when the
whole data set have to be fetched to get updates.

In the future maybe it would be possible to make some sort of
distributed arrangement so the replication can be against any of the
client servers instead to reduce payload on the original server. But
that would of course include some way to check data integrity against
the original server with a md5 hash or something so result is identical
with the original. 

ATB

Nicklas Avén


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Logical decoding client has the power to crash the server

2017-09-20 Thread Meel Velliste
>From what I understand about logical decoding, there is no limit to how
many log entries will be retained by the server if nobody reads them from
the logical slot. This means that a client that fails to read from the slot
has the power to bring down the master database because the server's disk
will get full at which point all subsequent write operations will fail and
even read operations will fail because they too need temporary space. Even
the underlying operating system may be affected as it too may need
temporary disk space to carry out its basic functions.

This essentially means that the client has the power to bring down the
server without even doing anything malicious, merely by being passive.
Seems like it should not be this way.

Is there really no way to limit the retained log entries, for example by
specifying the amount of time (e.g. 24 hours or 7 days), so that any
entries older than that would be automatically dropped?