Re: [GENERAL] XID wraparound with huge pg_largeobject
On 11/30/2015 9:58 AM, David Kensiski wrote: I am working with a client who has a 9.1 database rapidly approaching XID wraparound. They also have an exceedingly large pg_largeobject table (4217 GB) that has never been vacuumed. An attempt to vacuum this on a replica has run for days and never succeeded. (Or more accurately, never been allowed to succeed because we needed to get the replica back on-line.) ... Any other ideas about how we can do this? David, My gut reaction was maybe dump/restore ... but it's pg_largeobject. I have read the list for years and my memory tells me that it is a problem child in that arena. (e.g. as you found out w Slony...) and at 4000Gb, not something that can sandbox very well. Because it's v9.1... and you hadn't gotten any responses (until Jeff)... and I had the time... I did some digging in the archives... The most promising alternate idea... Last February Adam Hooper was migrating to SSD, Bill Moran suggesting trying to CLUSTER pg_largeobject instead of VACUUM FULL. (full topic: on 2/3/2015 entitled "VACUUM FULL pg_largeobject without (much) downtime?") CLUSTER has been referenced in the list other times to collapse unused space (sometime in 2010-2011): As a last resort this week, I'm going to get 500+GB of extra file store added, add a tablespace and move pg_largeobjects to this area. Then use CLUSTER to rebuild pg_largeobjects back in the default tablespace. This should fix things I hope, and if needed I'll use Cluster regularly. It's "an" other idea... I've no idea whether it will work any better than biting the bullet and just running VACUUM FULL. other bits and pieces... In 2010, Tom suggested REINDEX then VACUUM on pg_largeobject for an 8.? system. That peaked my interest because we found with 9.1 that weekly reindexing helped with performance. However the person who used it didn't find any performance improvement with his VACUUM. I think reindexing was added to VACUUM FULL in the 9.0 release (but would have to search the release notes to find it). I remember reading somewhere during this (but can't find the reference ) that an interrupted VACUUM FREEZE does capture "some" data, so multiples of those actually incrementally improves the speed of the next - but again I can't find the reference, so I've no idea who, when, version, and whether my memory is faulty or misapplied. There are miscellaneous improvements in the actual running of VACUUM FULL (and more often autovacuum) suggested through tweaking the vacuum parameters "vacuum_cost_delay" being a high priority target.Jeff's questions all point an identifying any limitations that are costing you time due to configuration. Totally not part of this specific problem... You have run or know of vacuumlo for deleting orphaned LOs...? Might be worth running it before you collect your free space. [just in case you didn't or hadn't... twice the bang, half the pain - but only if you do it before collecting your free space] Roxanne
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-01 13:13:27 -0500, Tom Lane wrote: > "Peter J. Holzer"writes: > > Postgres worker processes are single-threaded, are they? Is there > > something else which could interact badly with a moderately complex > > multithreaded I/O library used from a stored procedure? > > Yes, lots. If you cause additional threads to appear inside a backend > process, things could break arbitrarily badly. It's up to you to ensure > that none of those extra threads ever escape to execute any non-Perl > code. Actually, non-�MQ code. Perl doesn't like to be unexpectedly multithreaded either. Yes, those threads should only ever execute code from the �MQ library. In fact they are automatically created and destroyed by the library and there is no way to control them from Perl code (there may be a way to do that from the C API, but I don't remember seeing that in the manual). > I suspect this could easily explain the problems you're seeing. Quite. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] Replication with 9.4
On Tue, Oct 6, 2015 at 12:27 PM, Thomas Munrowrote: > On Sun, Oct 4, 2015 at 11:47 PM, Michael Paquier > wrote: > > (Seems like you forgot to push the Reply-all button) > > > > On Sun, Oct 4, 2015 at 7:01 PM, Madovsky wrote: > >> On 10/3/2015 3:30 PM, Michael Paquier wrote: > >>> and no reason is given to justify *why* this would be needed in your > case > >> reason for a choice can be often an issue for other :D > >> > >> I thought that postgresql 9.4 user could change on the fly with > >> synchronous_commit from local to on for ex > >> which hotstandby would become in sync and which in async to avoid a big > >> latency in case of let's say 100 hot standby. > >> it was an idea, a concept to let the master write and update the nodes, > like > >> a queen bee ;) > >> but I'm afraid it's not possible, so maybe future version of pg will do > it, > >> for now read from the master is my only solution. > > > > Well, Thomas Munro (adding him in CC) has sent for integration with > > 9.6 a patch that would cover your need, by adding to > > synchronous_commit a mode called 'apply', in which case a master would > > wait for the transaction to be applied on standby before committing > > locally: > > > http://www.postgresql.org/message-id/CAEepm=1fqkivl4v-otphwsgw4af9hcogimrcw-ybtjipx9g...@mail.gmail.com > > Perhaps you could help with the review of the patch, this has stalled > > a bit lately. > > That patch (or something more sophisticated long those lines) is a > small piece of a bigger puzzle, though it might be enough if you only > have one standby, are prepared to block until manual intervention if > that standby fails, and don't mind potentially lumpy apply > performance. See also the work being done to separate wal writing > from wal applying for smoother performance[1], and handle multiple > synchronous standbys[2]. But there is another piece of the puzzle > IMHO: how to know reliably that the standby that you are talking to > guarantees causal consistency, while also allowing standbys to > fail/drop out gracefully, and I'm currently working on an idea for > that. > FYI I posted the resulting proposal and patch over on the -hackers list. Feedback, ideas, flames welcome as always. http://www.postgresql.org/message-id/flat/CAEepm=0n_OxB2_pNntXND6aD85v5PvADeUY8eZjv9CBLk=z...@mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
Re: [GENERAL] XID wraparound with huge pg_largeobject
On Mon, Nov 30, 2015 at 9:58 AM, David Kensiskiwrote: > I am working with a client who has a 9.1 database rapidly approaching XID > wraparound. The hard limit at 2 billion, or the soft limit at autovacuum_freeze_max_age? > They also have an exceedingly large pg_largeobject table (4217 > GB) that has never been vacuumed. An attempt to vacuum this on a replica > has run for days and never succeeded. What was slowing it down? Reading? Writing? CPU? fdatasync? Locks? Was it run with throttling (e.g. nonzero vacuum_cost_delay) or without? What is the throughput available on our RAID? > Are there creative ways to do such a vacuum with minimal impact on > production? Even if I let the vacuum complete on the replica, I don't think > I can play accrued logs from the master, can I? No. And if you could replay the logs, I doubt it would have much of a different impact than just running the vacuum freeze on the master directly would. You just need to bite the bullet. At some point you need to read the entire table in one session, even if that means scheduling some downtime (or degraded performance time) in order to do it. It will also need to rewrite the entire table, but if there are "vacuum freeze" attempted but which don't run to completion, their partial work will lessen the amount of writing (but not reading) the ultimately successful vacuum will need to do. So start vacuum freeze now, and if you end up needing to cancel it at least part of its work will not go wasted. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
"Peter J. Holzer"writes: > But there is something else which may be relevant: ØMQ uses threads > internally, and I don't actually know whether zmq_msg_recv returning > means that the read(2) call (or whatever) on the socket terminates. ... ugh ... > Postgres worker processes are single-threaded, are they? Is there > something else which could interact badly with a moderately complex > multithreaded I/O library used from a stored procedure? Yes, lots. If you cause additional threads to appear inside a backend process, things could break arbitrarily badly. It's up to you to ensure that none of those extra threads ever escape to execute any non-Perl code. I suspect this could easily explain the problems you're seeing. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 12/01/2015 09:58 AM, Peter J. Holzer wrote: On 2015-12-01 07:16:04 -0800, Adrian Klaver wrote: On 12/01/2015 06:51 AM, Peter J. Holzer wrote: A rather weird observation from the log files of our server (9.5 beta1): 2015-12-01 09:23:37 CET [26265]: [42-1] user=fragner,db=wds,pid=26265 WARNING: Use of uninitialized value $success in concatenation (.) or string at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. 2015-12-01 09:23:37 CET [26265]: [43-1] user=fragner,db=wds,pid=26265 CONTEXT: PL/Perl function "mb_timeseriesdata_zmq" [lots of other stuff from different connections] 2015-12-01 09:24:45 CET [26265]: [44-1] user=fragner,db=wds,pid=26265 ERROR: impossible result '' (payload=) at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. WDS::Macrobond::Utils::decode_result("") called at line 30 main::__ANON__("gen_wqehur") called at -e line 0 eval {...} called at -e line 0 Two messages from the same line of the same plperlu stored procedure, 68 seconds apart. So what is this line 36? confess "impossible result '$success' (payload=$payload)"; What? The first message clearly comes from interpolating $success (which is undef at that point) into the argument. The second from confess itself. What could cause a plperlu procedure to freeze for 68 seconds between the call to confess and its output? Is it possible that only the writing of the log entry is delayed? Another weird thing: $success is undef because a ØMQ rpc call[1] timed And the call is? The sequence is: my $req_sck = zmq_socket($context, ZMQ_REQ); zmq_connect($req_sck, $url); my $qry_msg = join(...); zmq_send($req_sck, $qry_msg); my $res_msg = zmq_msg_init(); my $rv = zmq_msg_recv($res_msg, $req_sck); # check rv here. my $data = zmq_msg_data($res_msg); # $data is "" here my $result = WDS::Macrobond::Utils::decode_result($data); # the error messages are from this function (Yeah, ØMQ is quite low-level. There is a higher level Perl Module, but I'm not using it). I omitted that because I don't think it's terribly relevant here. Details of the usage of ØMQ are better discussed on the ØMQ mailing list. But there is something else which may be relevant: ØMQ uses threads internally, and I don't actually know whether zmq_msg_recv returning Except I see this here: http://api.zeromq.org/4-0:zmq-socket Thread safety ØMQ sockets are not thread safe. Applications MUST NOT use a socket from multiple threads except after migrating a socket from one thread to another with a "full fence" memory barrier. means that the read(2) call (or whatever) on the socket terminates. It may actually continue in another thread. But I still don't see how that could block the main thread (or wake it up again in a place which has nothing to do with ØMQ (confess is a standard Perl function to print a stack trace and die)). Or - just thinking aloud here - I fear I'm abusing you guys as support teddy bears[1] - maybe it's the other way round: confess dies, so maybe it frees some lock during cleanup which allows the message which should have been sent by zmq_send to finally go out on the wire. But that still doesn't explain the 68 seconds spent in confess ... Postgres worker processes are single-threaded, are they? Is there something else which could interact badly with a moderately complex multithreaded I/O library used from a stored procedure? I suspect such an interaction because I cannot reproduce the problem outside of a stored procedure. A standalone Perl script doing the same requests doesn't get a timeout. I guess Alvaro is right: I should strace the postgres worker process while it executes the stored procedure. The problem of course is that it happens often enough be annoying, but rarely enough that it's not easily reproducible. From here: http://api.zeromq.org/4-0:zmq-connect It seems something like(I am not a Perl programmer, so approach carefully): my $rc = zmq_connect($req_sck, $url); Then you will have an error code to examine. Have you looked at the Notes at the bottom of this page: http://www.postgresql.org/docs/9.4/interactive/plperl-trusted.html out (after 60 seconds, which is also a mystery, because ØMQ doesn't seem to have a default timeout of 60 seconds, and I don't set one). But at Network timeout? That was my first guess, but I don't see where it would come from. Or why it only is there if I call the code from a stored procedure, not from a standalone script. 09:24:45 (i.e. the time of the error message) the answer for that RPC call arrived. So it kind of looks like confess waited for the message to arrive (which makes no sense at all) or maybe that confess waited for something which also blocked the sending of the request (because according to the server logs, the RPC request only arrived there at 09:24:45 and was answered within 1 second), but that doesn't make any So if the request timed out how did you get a reply, a second
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-01 07:16:04 -0800, Adrian Klaver wrote: > On 12/01/2015 06:51 AM, Peter J. Holzer wrote: > >A rather weird observation from the log files of our server (9.5 beta1): > > > >2015-12-01 09:23:37 CET [26265]: [42-1] user=fragner,db=wds,pid=26265 > >WARNING: Use of uninitialized value $success in concatenation (.) or string > >at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. > >2015-12-01 09:23:37 CET [26265]: [43-1] user=fragner,db=wds,pid=26265 > >CONTEXT: PL/Perl function "mb_timeseriesdata_zmq" > >[lots of other stuff from different connections] > >2015-12-01 09:24:45 CET [26265]: [44-1] user=fragner,db=wds,pid=26265 ERROR: > > impossible result '' (payload=) at > >/usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. > > WDS::Macrobond::Utils::decode_result("") called at line 30 > > main::__ANON__("gen_wqehur") called at -e line 0 > > eval {...} called at -e line 0 > > > >Two messages from the same line of the same plperlu stored procedure, 68 > >seconds apart. So what is this line 36? > > > > confess "impossible result '$success' (payload=$payload)"; > > > >What? The first message clearly comes from interpolating $success > >(which is undef at that point) into the argument. The second from > >confess itself. What could cause a plperlu procedure to freeze for 68 > >seconds between the call to confess and its output? > > > >Is it possible that only the writing of the log entry is delayed? > > > >Another weird thing: $success is undef because a ØMQ rpc call[1] timed > > And the call is? The sequence is: my $req_sck = zmq_socket($context, ZMQ_REQ); zmq_connect($req_sck, $url); my $qry_msg = join(...); zmq_send($req_sck, $qry_msg); my $res_msg = zmq_msg_init(); my $rv = zmq_msg_recv($res_msg, $req_sck); # check rv here. my $data = zmq_msg_data($res_msg); # $data is "" here my $result = WDS::Macrobond::Utils::decode_result($data); # the error messages are from this function (Yeah, ØMQ is quite low-level. There is a higher level Perl Module, but I'm not using it). I omitted that because I don't think it's terribly relevant here. Details of the usage of ØMQ are better discussed on the ØMQ mailing list. But there is something else which may be relevant: ØMQ uses threads internally, and I don't actually know whether zmq_msg_recv returning means that the read(2) call (or whatever) on the socket terminates. It may actually continue in another thread. But I still don't see how that could block the main thread (or wake it up again in a place which has nothing to do with ØMQ (confess is a standard Perl function to print a stack trace and die)). Or - just thinking aloud here - I fear I'm abusing you guys as support teddy bears[1] - maybe it's the other way round: confess dies, so maybe it frees some lock during cleanup which allows the message which should have been sent by zmq_send to finally go out on the wire. But that still doesn't explain the 68 seconds spent in confess ... Postgres worker processes are single-threaded, are they? Is there something else which could interact badly with a moderately complex multithreaded I/O library used from a stored procedure? I suspect such an interaction because I cannot reproduce the problem outside of a stored procedure. A standalone Perl script doing the same requests doesn't get a timeout. I guess Alvaro is right: I should strace the postgres worker process while it executes the stored procedure. The problem of course is that it happens often enough be annoying, but rarely enough that it's not easily reproducible. > >out (after 60 seconds, which is also a mystery, because ØMQ doesn't seem > >to have a default timeout of 60 seconds, and I don't set one). But at > > Network timeout? That was my first guess, but I don't see where it would come from. Or why it only is there if I call the code from a stored procedure, not from a standalone script. > >09:24:45 (i.e. the time of the error message) the answer for that RPC > >call arrived. So it kind of looks like confess waited for the message to > >arrive (which makes no sense at all) or maybe that confess waited for > >something which also blocked the sending of the request (because > >according to the server logs, the RPC request only arrived there at > >09:24:45 and was answered within 1 second), but that doesn't make any > > So if the request timed out how did you get a reply, a second request? Nope. I don't really "get" the reply. I just see in the logs of the other server that it sent a reply at that time. The time line is like this timepostgres processmb_dal process T zmq_send() zmq_msg_recv() T+60zmq_msg_recv returns an empty message; decode() is called which notices that the message is empty and calls confess(); T+128 confess() prints a receives the message sent at T; stacktrace; sends a reply; hp [1] For
Re: [GENERAL] 2 questions
On 12/01/2015 09:16 AM, anj patnaik wrote: Thanks for the info Scott. Can I setup a primary server called A that uses the Postgres installation on a nfs mounted filesystem and then A does a daily backup of database A and restores to database B on same filesystem. Then I have server B acting as cold standby and if server A goes down, then B would make the backup DB active? In this case, both server A and B have the same NFS mount, but B is not always writing to the data. The target environment I am moving to is a RAID5 system with mirrored discs. I will be getting a VM on a physical server and the mirrored disks will store the database. I would like to setup a primary and secondary as cold standby. I am fairly new to PG so asking these questions. Please advise. Thank you. Advice, read these pages: http://www.postgresql.org/docs/9.4/interactive/tutorial-arch.html http://www.postgresql.org/docs/9.4/interactive/creating-cluster.html http://www.postgresql.org/docs/9.4/interactive/warm-standby.html -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] json indexing and data types
Hi As json essentially only has three basic data types, string, int, and boolean, I wonder how much of this - to index, search, and sort on unstructured data - is possible. I guess part of the answer would be 'jsquery and vodka', but let me describe the problem first. The basics is, that I have a column with what is essentially json data; a number of data structures of different depths. Perhaps 10 - 30 top levels, and probably no more than 3, max 4 levels deep. In total there are some hundred thousands of rows in each table. It would probably be best stored as jsonb. Right now it's text, because it's only used by the application itself. It would be incredibly useful to add an index to this column, and to be able to search, using the index, on arbitrary elements. This part seems already there, with jsquery. The hard part is that some of the data items really have another type. There are dates and floating points, as the most important ones. And the really hard part is that sorting and range searches are important, especially for these two types. Having dates is iso-format, and left-padding floats with zeros is a low tech solution, and especially the latter is not very efficient. The solution might be to add functional indexes for these data items, but it's cumbersome and not easily maintainable. If a one-stop solution is in the works, or already there, it could save a lot of time. /kaare -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Exclusively locking parent tables while disinheriting children.
On Wed, Sep 16, 2015 at 12:10 AM, Thom Brownwrote: > On 7 August 2015 at 12:34, Thom Brown wrote: >> On 30 July 2015 at 13:35, Rowan Collins wrote: >>> So what I want to understand is what the risk of adding this lock are - >>> under what circumstances would I expect to see dead locks if I manually >>> added this lock to my partition maintenance functions? >> I'm not clear on the problems such a change would present either, but I'm >> probably overlooking the relevant scenario. > Has anyone got insight as to what's wrong with exclusively locking a parent > table to disinherit a child table? I don't see why that error would occur if concurrently selecting directly from a "child". I can imagine it occurring if selected from the parent. For example, one session performs ALTER TABLE somechild NO INHERIT parent, then another session trying to select from parent would block for lock on somechild. Once it gets the lock, it checks if somechild exists at all for safety (because other session may have dropped it). What it doesn't check though is whether somechild is still in the list of children of parent. Having failed to do that, it encounters the error in question when further along it tries to find parent's attributes in somechild which are no longer there (remember, other session just disinherited parent). That perhaps needs some fixing. I may be missing though. Thanks, Amit -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] how to import "where exists(subquery)" EXISTS CONDITION performance?
I had saw this sentence: SQL statements that use the EXISTS condition in PostgreSQL are very inefficient since the sub-query is RE-RUN for EVERY row in the outer query's table. There are more efficient ways to write most queries, that do not use the EXISTS condition. So,I want to know how PostgreSQL to implement the EXISTS condition? Is that sentence true? and,if that is true,are there any methods to import the performance of the EXISTS condition? -- View this message in context: http://postgresql.nabble.com/how-to-import-where-exists-subquery-EXISTS-CONDITION-performance-tp5875801.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-01 10:20:09 -0800, Adrian Klaver wrote: > On 12/01/2015 09:58 AM, Peter J. Holzer wrote: > >On 2015-12-01 07:16:04 -0800, Adrian Klaver wrote: > >>On 12/01/2015 06:51 AM, Peter J. Holzer wrote: > >>>A rather weird observation from the log files of our server (9.5 beta1): > >>> > >>>2015-12-01 09:23:37 CET [26265]: [42-1] user=fragner,db=wds,pid=26265 > >>>WARNING: Use of uninitialized value $success in concatenation (.) or > >>>string at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. > >>>2015-12-01 09:23:37 CET [26265]: [43-1] user=fragner,db=wds,pid=26265 > >>>CONTEXT: PL/Perl function "mb_timeseriesdata_zmq" > >>>[lots of other stuff from different connections] > >>>2015-12-01 09:24:45 CET [26265]: [44-1] user=fragner,db=wds,pid=26265 > >>>ERROR: impossible result '' (payload=) at > >>>/usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. > >>> WDS::Macrobond::Utils::decode_result("") called at line 30 > >>> main::__ANON__("gen_wqehur") called at -e line 0 > >>> eval {...} called at -e line 0 > >>> > >>>Two messages from the same line of the same plperlu stored procedure, 68 > >>>seconds apart. So what is this line 36? > >>> > >>> confess "impossible result '$success' (payload=$payload)"; > >>> > >>>What? The first message clearly comes from interpolating $success > >>>(which is undef at that point) into the argument. The second from > >>>confess itself. What could cause a plperlu procedure to freeze for 68 > >>>seconds between the call to confess and its output? > >>> > >>>Is it possible that only the writing of the log entry is delayed? > >>> > >>>Another weird thing: $success is undef because a ØMQ rpc call[1] timed > >> > >>And the call is? > > > >The sequence is: > > > >my $req_sck = zmq_socket($context, ZMQ_REQ); > >zmq_connect($req_sck, $url); > >my $qry_msg = join(...); > >zmq_send($req_sck, $qry_msg); > >my $res_msg = zmq_msg_init(); > >my $rv = zmq_msg_recv($res_msg, $req_sck); > ># check rv here. > >my $data = zmq_msg_data($res_msg); # $data is "" here > >my $result = WDS::Macrobond::Utils::decode_result($data); # the error > >messages are from this function > > > >(Yeah, ØMQ is quite low-level. There is a higher level Perl Module, but > >I'm not using it). > > > >I omitted that because I don't think it's terribly relevant here. > >Details of the usage of ØMQ are better discussed on the ØMQ mailing > >list. > > > >But there is something else which may be relevant: ØMQ uses threads > >internally, and I don't actually know whether zmq_msg_recv returning > > Except I see this here: > > http://api.zeromq.org/4-0:zmq-socket > > Thread safety > > ØMQ sockets are not thread safe. Applications MUST NOT use a socket from > multiple threads except after migrating a socket from one thread to another > with a "full fence" memory barrier. Well yes, but I don't use a ØMQ socket in multiple threads, It is created in the stored procedure and destroyed at the end (just checked the strace output: Yes it is. For a moment I wasn't sure whether lexical variables in plperlu procedures go out of scope.). It's the ØMQ library itself which creates extra threads (And it should terminate them properly and afaics from strace it does). [...] > From here: > > http://api.zeromq.org/4-0:zmq-connect > > It seems something like(I am not a Perl programmer, so approach carefully): > > my $rc = zmq_connect($req_sck, $url); > > Then you will have an error code to examine. Yes. I have been a bit sloppy with error checking. I check only the return value of the zmq_msg_recv() call which returns the empty message. It is possible that the problem actually occurs earlier and I should check those calls as well. Mea culpa. However, in at least one case the failed call was indeed zmq_msg_recv() not one of the earlier ones (see my next mail). > Have you looked at the Notes at the bottom of this page: > > http://www.postgresql.org/docs/9.4/interactive/plperl-trusted.html I have. I don't think that's a problem here: Debian perl is built with with both multiplicity and ithreads, and I would assume that the .deb packages from postgresql.org use the shared library provided by the system. But even if that wasn't the case it should not be a problem as only plperlu stored procedures are called. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-01 18:58:31 +0100, Peter J. Holzer wrote: > I suspect such an interaction because I cannot reproduce the problem > outside of a stored procedure. A standalone Perl script doing the same > requests doesn't get a timeout. > > I guess Alvaro is right: I should strace the postgres worker process > while it executes the stored procedure. The problem of course is that > it happens often enough be annoying, but rarely enough that it's not > easily reproducible. I did manage to catch a timeout once with strace in the mean time, although that one was much more straightforward and less mysterious than the original case: postgres process sends message, about 10 seconds later it receives a SIGALRM which interrupts an epoll, reply hasn't yet arrived, error message to client and log file. No waits in functions which shouldn't wait or messages which arrive much later than they were (presumably) sent. The strace doesn't show a reason for the SIGALRM, though. No alarm(2) or setitimer(2) system call (I connected strace to a running postgres process just after I got the prompt from "psql" and before I typed "select * from mb_search('export');" (I used a different (but very similar) stored procedure for those tests because it is much easier to find a search which is slow enough to trigger a timeout at least sometimes than a data request (which normally finishes in milliseconds)). So I guess my next task will be to find out where that SIGALRM comes from and/or whether I can just restart the zmq_msg_recv if it happens. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue
> Before I start in to implement a DELETE / AUTOVACUUM / VACUUM approach, to > recycling disk space used for a session management table, I would like to > propose, for consideration by this forum, an idea for a different approach. > > A row in a session management table, represents a significant "sunk cost" in > both computing time used to create the row, and in disk space allocated. > Postgres has to use a lot of resources to create that row in the first > place. > > When the session that originally caused that row to be allocated, eventually > expires -- why delete the associated session managent row ?? > > Instead of using a DELETE command to destroy the row (and a > resource-intensive AUTOVACUUM / VACUUM process to reclaim the storage > space), why not instead, simply mark that session management row as "free" > (with an UPDATE command) ?? An UPDATE is a combination of an INSERT and a DELETE command. However, rows marked as deleted will be reused at some point after autovacuum (or manual VACUUM) has made sure they are no longer in use by any DB session. So your approach can still work, as long as you vacuum that table frequently enough. The actual solution isn't the UPDATE instead of the DELETE though, but rather the partial index and an increased autovacuum frequency. Alban. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Pgbasebackup help
Hi All, I need some help in postgresql base backup. We are currently using multiple DBMS in our project and postgresql is one of them. Our private DBMS keeps the online data and postgresql keeps online as well as historical data. At present, we are doing the backup/restore process for our project. So we planned to use Pg_basebackup instead of pg_dump. Below is the backup steps. Time-T1 = Start the backup of private DBMS. Time-T2 = Finished the private backup DBMS. Time-T3 = Start the pg_basebackup. Time-T4 = End the Pg_basebackup. Here the requirement is we don't want to restore the data after Time-T3. But when I followed this approach https://opensourcedbms.com/dbms/point-in-time-recovery-pitr-using-pg_basebackup-with-postgresql-9-2/, I am still getting the information's archived from Time-T3 to TimeT4. Seems, WAL archives are holding all the transactions, which are happened between Time T3 - Time T4. Also, I don't want enable archive_mode = on as it needs to maintain archives files. So I decided the enable only these parameters. Postgresql.conf - wal_level = hot_standby max_wal_senders = 1 And added replication permissions for the current user in pg_hba.conf. It does, what I need it. In the backup I did not have the data between T3-T4. Is this correct or is there anything I missing it. Please let me know. With best regards, Ramkumar Yelai Siemens Technology and Services Private Limited CT DC AA I HOUSE DEV GL4 84, Hosur Road Bengaluru 560100, Indien Tel.: +91 80 33136494 Fax: +91 80 33133389 Mobil: +91 9886182031 mailto:ramkumar.ye...@siemens.com http://www.siemens.co.in/STS Registered Office: 130, Pandurang Budhkar Marg, Worli, Mumbai 400 018. Telephone +91 22 39677000. Fax +91 22 39677075. Other Offices: Bangalore, Chennai, Gurgaon, Noida, Kolkata, Pune. Corporate Identity number:U9MH1986PLC093854
Re: [GENERAL] 2 questions
1) directory listing: /opt/nfsDir/postgres/9.4/ /bin /data /etc /pgAdmin3 2) The way I am using PG now is that I have specified the directory above as the location to install it (from graphical installer). Now, it appears that postgres places files in other directories besides the one specified in the installer. For instance, there are scripts in /etc/init.d to start the service. So in answering my own question: it appears PG places files in other dirs so when I am given a new VM/different physical server with the same NFS mount I would either need to copy these files over or better yet un-install the current PG and re-install from scratch. Thanks, ap On Fri, Nov 27, 2015 at 8:30 PM, Adrian Klaverwrote: > On 11/27/2015 01:17 PM, anj patnaik wrote: > >> Hello, >> Yes, postgres is currently installed on a nfs mounted file system. So >> when graphical installer runs, there is a form which asks location for >> installation. I specified this path /opt/nfsDir/Postgres where nfsDir is >> a nfs mount. So currently this is where PG 9.4 lives. >> > > What is under /opt/nfsDir/Postgres? > > >> My question is when I am given a brand new VM on a different physical >> server, can I mount that same NFS FS and use the Postgres or do I need >> to re-install PG on new VM? >> > > How are you using Postgres now? Please be specific, more detail is better > then less at this point. > > >> I am not sure if PG writes to any other directories besides the one >> where it is installed. >> > > > >> On the issue of logging, I see a lot of log statements because client >> apps do upserts and since I use Tcl I don't have a SQL proc, but rather >> I let it exception and then do an update on the row. >> > > You can Tcl in the database: > > http://www.postgresql.org/docs/9.4/interactive/pltcl.html > > That will not change things if you let the database throw an exception > there also. > > > >> So, you can limit the size of an individual log, but there is no way to >> tell PG to keep the log file short? >> >> if i choose FATAL, I'd lose some log, right? >> >> Thank you! >> ap >> >> > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 12/01/2015 06:51 AM, Peter J. Holzer wrote: A rather weird observation from the log files of our server (9.5 beta1): 2015-12-01 09:23:37 CET [26265]: [42-1] user=fragner,db=wds,pid=26265 WARNING: Use of uninitialized value $success in concatenation (.) or string at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. 2015-12-01 09:23:37 CET [26265]: [43-1] user=fragner,db=wds,pid=26265 CONTEXT: PL/Perl function "mb_timeseriesdata_zmq" [lots of other stuff from different connections] 2015-12-01 09:24:45 CET [26265]: [44-1] user=fragner,db=wds,pid=26265 ERROR: impossible result '' (payload=) at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. WDS::Macrobond::Utils::decode_result("") called at line 30 main::__ANON__("gen_wqehur") called at -e line 0 eval {...} called at -e line 0 Two messages from the same line of the same plperlu stored procedure, 68 seconds apart. So what is this line 36? confess "impossible result '$success' (payload=$payload)"; What? The first message clearly comes from interpolating $success (which is undef at that point) into the argument. The second from confess itself. What could cause a plperlu procedure to freeze for 68 seconds between the call to confess and its output? Is it possible that only the writing of the log entry is delayed? Another weird thing: $success is undef because a ØMQ rpc call[1] timed And the call is? out (after 60 seconds, which is also a mystery, because ØMQ doesn't seem to have a default timeout of 60 seconds, and I don't set one). But at Network timeout? 09:24:45 (i.e. the time of the error message) the answer for that RPC call arrived. So it kind of looks like confess waited for the message to arrive (which makes no sense at all) or maybe that confess waited for something which also blocked the sending of the request (because according to the server logs, the RPC request only arrived there at 09:24:45 and was answered within 1 second), but that doesn't make any So if the request timed out how did you get a reply, a second request? sense either. (Just noticed that 60 + 68 == 128, which is also a round number). This looks a question for a ØMQ list. Another thought, have you tried the code outside Postgres to see if works? hp [1] ØMQ is an IPC framework: See http://zeromq.org/ We use it to make RPC calls from stored procedures to a server process. -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] FATAL: unable to read data from DB node 0
Hi everybody I've the next architecture with pgpool (streaming replication mode): 4 nodes 1 Master Node 2 Standbys Node 1 pgpool Node I've disabled the load balancing, because some clients report me problems with the load balancer, they told me the load balancer some times send querys to standby nodes, which has not yet recovered data and the querys fail, but this topic is for another thread. When I try run a stress test with hammerdb I see next errors in the pgpool Node 2015-11-27 16:48:21: pid 20190: FATAL: unable to read data from DB node 0 2015-11-27 16:48:21: pid 20190: DETAIL: EOF encountered with backend 2015-11-27 16:48:21: pid 19182: LOG: child process with pid: 20190 exits with status 256 2015-11-27 16:48:21: pid 19182: LOG: fork a new child process with pid: 20298 2015-11-27 16:48:21: pid 20163: FATAL: unable to read data from DB node 0 2015-11-27 16:48:21: pid 20163: DETAIL: EOF encountered with backend 2015-11-27 16:48:21: pid 19182: LOG: child process with pid: 20163 exits with status 256 2015-11-27 16:48:21: pid 19182: LOG: fork a new child process with pid: 20299 In all Child, therefore the hammerdb stop the test because all connections it lots. Any suggestions? Best regards! DrakoRod - Dame un poco de fe, eso me bastará. Rozvo Ware Solutions -- View this message in context: http://postgresql.nabble.com/FATAL-unable-to-read-data-from-DB-node-0-tp5875389.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 2 questions
Hello, Yes, postgres is currently installed on a nfs mounted file system. So when graphical installer runs, there is a form which asks location for installation. I specified this path /opt/nfsDir/Postgres where nfsDir is a nfs mount. So currently this is where PG 9.4 lives. My question is when I am given a brand new VM on a different physical server, can I mount that same NFS FS and use the Postgres or do I need to re-install PG on new VM? I am not sure if PG writes to any other directories besides the one where it is installed. On the issue of logging, I see a lot of log statements because client apps do upserts and since I use Tcl I don't have a SQL proc, but rather I let it exception and then do an update on the row. So, you can limit the size of an individual log, but there is no way to tell PG to keep the log file short? if i choose FATAL, I'd lose some log, right? Thank you! ap On Fri, Nov 27, 2015 at 10:19 AM, Adrian Klaverwrote: > On 11/25/2015 10:28 AM, anj patnaik wrote: > >> Hello all, >> I've got 2 more questions. The cron job is now setup and email gets >> generated with proper body. >> >> I've one setup with NFS which appears to work smoothly. Now soon, I will >> be given a Linux VM on a different physical server, but will retain my >> NFS mount. I've installed Postgres 9.4 using the graphical installer and >> specified the directory for the nfs mount. >> > > Define 'setup with NFS'. > > So are you saying this how you set up the old instance or how you are > setting up the new VM? > > >> 1) When I move to the new VM, can I keep using that NFS mount without >> having to do a re-install of PG? This would be a different physical >> machine. >> > > Where is Postgres installed now? > > Not entirely following, but I hope you are not asking if two Postgres > installs can share the same NFS mount? That will end badly. > > > It might help if you give a schematic description of what you are trying > to achieve. > > >> 2) I have a cron job that deletes log files older than 10 days, but I am >> noticing rather large log files. Is there a way to limit the size of log >> files? >> > > What I do is keep the previous 2 days of files as written and then keep > compressed files older then that to some period of time. The files compress > a good bit so it works well for me. This is done via a cron script that > runs each night. > > >> users do upserts and they are valid, but those are getting dumped as >> error statements. I set the verbosity to "terse", but still seeing lots >> of log output. >> > > So what are the errors? > > As to 'terse': > > > http://www.postgresql.org/docs/9.4/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT > log_error_verbosity (enum) > > Controls the amount of detail written in the server log for each > message that is logged. Valid values are TERSE, DEFAULT, and VERBOSE, each > adding more fields to displayed messages. TERSE excludes the logging of > DETAIL, HINT, QUERY, and CONTEXT error information. VERBOSE output includes > the SQLSTATE error code (see also Appendix A) and the source code file > name, function name, and line number that generated the error. Only > superusers can change this setting. > > > You lose a lot of valuable information this way. I would go with Albe's > suggestion and change log_min_error_statement. > > > > >> My settings are as follows: >> postgres=# select name,setting,unit from pg_settings where name like >> '%log%'; >> name |setting | unit >> -++-- >> log_autovacuum_min_duration | -1 | ms >> log_checkpoints | off| >> log_connections | off| >> log_destination | stderr | >> log_directory | pg_log | >> log_disconnections | off| >> log_duration| off| >> log_error_verbosity | terse | >> log_executor_stats | off| >> log_file_mode | 0600 | >> log_filename| postgresql-%Y-%m-%d_%H%M%S.log | >> log_hostname| off| >> log_line_prefix | %t | >> log_lock_waits | off| >> log_min_duration_statement | -1 | ms >> log_min_error_statement | error | >> log_min_messages| error | >> log_parser_stats| off| >> log_planner_stats | off| >> log_rotation_age| 1440
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
Peter J. Holzer wrote: > Two messages from the same line of the same plperlu stored procedure, 68 > seconds apart. So what is this line 36? > > confess "impossible result '$success' (payload=$payload)"; > > What? The first message clearly comes from interpolating $success > (which is undef at that point) into the argument. The second from > confess itself. What could cause a plperlu procedure to freeze for 68 > seconds between the call to confess and its output? Try running it under strace. Maybe it's trying to resolve some name and its resolver takes a minute to discover it can't, or something like that. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On Tue, Dec 1, 2015 at 8:51 AM, Peter J. Holzerwrote: > What could cause a plperlu procedure to freeze for 68 > seconds between the call to confess and its output? > Another weird thing: $success is undef because a ØMQ rpc call[1] timed > out (after 60 seconds, which is also a mystery, because ØMQ doesn't seem > to have a default timeout of 60 seconds, and I don't set one). One thing I would check is the TCP keepalive settings. If you don't sort it out, please read this and post with more information; it's tough to guess at a cause without knowing the version of PostgreSQL or the OS involved, etc. https://wiki.postgresql.org/wiki/Guide_to_reporting_problems -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue
On Mon, 30 Nov 2015 23:07:36 -0500, "Steve Petrie, P.Eng."wrote: >Instead of using a DELETE command to destroy the row (and a >resource-intensive AUTOVACUUM / VACUUM process to reclaim the storage >space), why not instead, simply mark that session management row as "free" >(with an UPDATE command) ?? Alban beat me to the answer 8-) But to expand a little: Your plan won't work because Postgresql does not update in place - it inserts a new changed row and marks the old as deleted. It does not physically overwrite the old row until the table is vacuumed. [If even then - a row which lies beyond the logical end-of-table when vacuum is finished won't be overwritten until its space is recycled.] This behavior, known as MVCC (multiple version concurrency control), is integral to transaction isolation: selects which are running concurrently with the update may already have seen the old row and must continue to see it until they complete, even if the update completes first. Postgresql doesn't support "dirty read" isolation. A row can't be physically dropped or its space overwritten while any transaction that can "see" it is still running. For more: https://devcenter.heroku.com/articles/postgresql-concurrency https://momjian.us/main/writings/pgsql/mvcc.pdf Actually lots of great stuff in the presentation section on Bruce Momjian's site: https://momjian.us George -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] plperlu stored procedure seems to freeze for a minute
A rather weird observation from the log files of our server (9.5 beta1): 2015-12-01 09:23:37 CET [26265]: [42-1] user=fragner,db=wds,pid=26265 WARNING: Use of uninitialized value $success in concatenation (.) or string at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. 2015-12-01 09:23:37 CET [26265]: [43-1] user=fragner,db=wds,pid=26265 CONTEXT: PL/Perl function "mb_timeseriesdata_zmq" [lots of other stuff from different connections] 2015-12-01 09:24:45 CET [26265]: [44-1] user=fragner,db=wds,pid=26265 ERROR: impossible result '' (payload=) at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. WDS::Macrobond::Utils::decode_result("") called at line 30 main::__ANON__("gen_wqehur") called at -e line 0 eval {...} called at -e line 0 Two messages from the same line of the same plperlu stored procedure, 68 seconds apart. So what is this line 36? confess "impossible result '$success' (payload=$payload)"; What? The first message clearly comes from interpolating $success (which is undef at that point) into the argument. The second from confess itself. What could cause a plperlu procedure to freeze for 68 seconds between the call to confess and its output? Is it possible that only the writing of the log entry is delayed? Another weird thing: $success is undef because a ØMQ rpc call[1] timed out (after 60 seconds, which is also a mystery, because ØMQ doesn't seem to have a default timeout of 60 seconds, and I don't set one). But at 09:24:45 (i.e. the time of the error message) the answer for that RPC call arrived. So it kind of looks like confess waited for the message to arrive (which makes no sense at all) or maybe that confess waited for something which also blocked the sending of the request (because according to the server logs, the RPC request only arrived there at 09:24:45 and was answered within 1 second), but that doesn't make any sense either. (Just noticed that 60 + 68 == 128, which is also a round number). hp [1] ØMQ is an IPC framework: See http://zeromq.org/ We use it to make RPC calls from stored procedures to a server process. -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] 2 questions
> On Nov 30, 2015, at 12:54, anj patnaikwrote: > > 1) directory listing: > > /opt/nfsDir/postgres/9.4/ > /bin > /data > /etc >/pgAdmin3 > > The data directory will cause you many problems. You will need one data directory that is accessed by one AND ONLY one host for each node connected. You can't run an instance on multiple machines pointing to the same 'data' directory simultaneously. Data directories cannot be shared by multiple instances simultaneously, that's an active/active shared disk cluster and most databases don't support it or require massive overhead ( network/licensing I.e. Oracle rac) to do that. You *can* re-use the other directories, it can be wrought with issues, and you need to carefully think though upgrades, etc > 2) The way I am using PG now is that I have specified the directory above as > the location to install it (from graphical installer). > > Now, it appears that postgres places files in other directories besides the > one specified in the installer. For instance, there are scripts in > /etc/init.d to start the service. > > So in answering my own question: it appears PG places files in other dirs so > when I am given a new VM/different physical server with the same NFS mount I > would either need to copy these files over or better yet un-install the > current PG and re-install from scratch. > > Thanks, > ap > >> On Fri, Nov 27, 2015 at 8:30 PM, Adrian Klaver >> wrote: >>> On 11/27/2015 01:17 PM, anj patnaik wrote: >>> Hello, >>> Yes, postgres is currently installed on a nfs mounted file system. So >>> when graphical installer runs, there is a form which asks location for >>> installation. I specified this path /opt/nfsDir/Postgres where nfsDir is >>> a nfs mount. So currently this is where PG 9.4 lives. >> >> What is under /opt/nfsDir/Postgres? >> >>> >>> My question is when I am given a brand new VM on a different physical >>> server, can I mount that same NFS FS and use the Postgres or do I need >>> to re-install PG on new VM? >> >> How are you using Postgres now? Please be specific, more detail is better >> then less at this point. >> >>> >>> I am not sure if PG writes to any other directories besides the one >>> where it is installed. >> >> >>> >>> On the issue of logging, I see a lot of log statements because client >>> apps do upserts and since I use Tcl I don't have a SQL proc, but rather >>> I let it exception and then do an update on the row. >> >> You can Tcl in the database: >> >> http://www.postgresql.org/docs/9.4/interactive/pltcl.html >> >> That will not change things if you let the database throw an exception there >> also. >> >> >>> >>> So, you can limit the size of an individual log, but there is no way to >>> tell PG to keep the log file short? >>> >>> if i choose FATAL, I'd lose some log, right? >>> >>> Thank you! >>> ap >> >> -- >> Adrian Klaver >> adrian.kla...@aklaver.com >