Re: [GENERAL] Alternative to Multi-Master Replication with 2 Data centers??

2014-03-31 Thread jaime soler
On sáb, 2014-03-29 at 12:50 -0700, John R Pierce wrote:
 On 3/29/2014 11:35 AM, ethode wrote:
  Currently we're considering several options, of which Magic
  replication appears to be the top option.
 
 fixed that for you, hope that helps!
 
 kidding aside, there's no magic bullet here that does what you describe 
 without introducing its own complex set of problems and performance 
 issues, not the least of which is transaction conflict resolution.

checkout xDB Replication Server v5.0 Multi-Master
http://www.enterprisedb.com/products-services-training/products-overview/xdb-replication-server-multi-master
maybe it adjust to your requirements.



 
 
 
 
 -- 
 john r pierce  37N 122W
 somewhere on the middle of the left coast
 
 
 




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [ADMIN][GENERAL] openvz and shared memory trouble

2014-03-31 Thread Willy-Bas Loos
On Sat, Mar 29, 2014 at 6:17 PM, Adrian Klaver adrian.kla...@aklaver.comwrote:

 On 03/29/2014 08:19 AM, Willy-Bas Loos wrote:

 The error that shows up is a Bus error.
 That's on the replication slave.
 Here's the log about it:
 2014-03-29 12:41:33 CET db: ip: us: FATAL:  could not receive data from
 WAL stream: server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.

 cp: cannot stat
 `/data/postgresql/9.1/main/wal_archive/00010072000A': No
 such file or directory
 2014-03-29 12:41:33 CET db: ip: us: LOG:  unexpected pageaddr
 71/E9DA in log file 114, segment 10, offset 14286848
 cp: cannot stat
 `/data/postgresql/9.1/main/wal_archive/00010072000A': No
 such file or directory
 2014-03-29 12:41:33 CET db: ip: us: LOG:  streaming replication
 successfully connected to primary
 2014-03-29 12:41:48 CET db: ip: us: LOG:  startup process (PID 17452)
 was terminated by signal 7: Bus error
 2014-03-29 12:41:48 CET db: ip: us: LOG:  terminating any other active
 server processes
 2014-03-29 12:41:48 CET db:wbloos ip:[local] us:wbloos WARNING:
 terminating connection because of crash of another server process
 2014-03-29 12:41:48 CET db:wbloos ip:[local] us:wbloos DETAIL:  The
 postmaster has commanded this server process to roll back the current
 transaction and exit, because another server process exited abnormally
 and possibly corrupted shared memory.
 2014-03-29 12:41:48 CET db:wbloos ip:[local] us:wbloos HINT:  In a
 moment you should be able to reconnect to the database and repeat your
 command.


 Well what I am seeing are WAL log errors. One saying no file is
 present, the other pointing at a possible file corruption.

Those are normal notices, nothing to worry about.


 Shared memory problems are offered as a possible cause only. Right now I
 would say we are seeing only half the picture. The Postgres logs from the
 same time period for the primary server, as well as the system logs for the
 openvz container would help fill in the other half of the picture.


Here's the log from the primary postgres server:
2014-03-29 12:41:29 CET db:wbloos ip:[local] us:wbloos NOTICE:  ALTER TABLE
will create implicit sequence test_x_seq for serial column test.x
2014-03-29 12:41:33 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
LOG:  SSL renegotiation failure
2014-03-29 12:41:33 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
LOG:  SSL error: unexpected record
2014-03-29 12:41:33 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
LOG:  could not send data to client: Connection reset by peer
2014-03-29 12:41:48 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
LOG:  could not receive data from client: Connection reset by peer
2014-03-29 12:41:48 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
LOG:  unexpected EOF on standby connection

(the SSL renegotiation failure happens all the time, without the crash)

And here's the syslog form the container:
Mar 29 12:41:01 mycontainer snmpd[8819]: Connection from UDP:
[xxx.xxx.xxx.xxx]:59090-[xxx.xxx.xxx.xxx]
Mar 29 12:42:30 mycontainer snmpd[8819]: Connection from UDP:
[xxx.xxx.xxx.xxx]:35949-[xxx.xxx.xxx.xxx]

The log on the host doesn't say anything interesting either.

A cursory look at memory management in openvz shows it is different from
 other virtualization software and physical machines. Whether that is a
 problem would seem to be dependent on where you are on the learning curve:)

That sounds like there is a solution to the problem, all you have to do is
find out what it is. There doesn't seem to be a variable in the
beancounters or anywhere else that can prevent the bus error from happening.
There's seems to be no separate way of guaranteeing shared memory. There's
no OOM killer active either, nor is host or server running short of memory.

I'm still worried that it's like Tom Lane said in another discussion:So
basically, you've got a broken kernel here: it claimed to give PG circa
(135MB) of memory, but what's actually there is only about (128MB). I don't
see any connection between those numbers and the shmmax/shmall settings,
either --- so I think this must be some busted implementation of a VM-level
limitation.
(here:
http://www.postgresql.org/message-id/CAK3UJREBcyVBtr8D7vMfU=uddkjxkrpngcuy8eryb0tmfke...@mail.gmail.com
)

And it makes me wonder what else may be issues that arise from that. But
especially, what i can do about it.

Cheers,

WBL

-- 
Quality comes from focus and clarity of purpose -- Mark Shuttleworth


[GENERAL] PSQL log file

2014-03-31 Thread Nithya Soman
Hi

Is there any option in psql version 9.2.4, to insert the date time
also inside the psql log files(in path /var/log/postgresql) ?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PSQL log file

2014-03-31 Thread Shaun Thomas

 Is there any option in psql version 9.2.4, to insert the date time
 also inside the psql log files(in path /var/log/postgresql) ?

If you look in postgresql.conf for your installation, you can change the log 
prefix by setting log_line_prefix. If you add %t to that string, you'll get a 
timestamp + date value with every entry. The postgresql.conf file should also 
list all of the honored prefixes, or you can use this URL for more guidance:

http://www.postgresql.org/docs/9.2/static/runtime-config-logging.html

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd | Suite 400 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [ADMIN][GENERAL] openvz and shared memory trouble

2014-03-31 Thread Adrian Klaver

On 03/31/2014 04:12 AM, Willy-Bas Loos wrote:


On Sat, Mar 29, 2014 at 6:17 PM, Adrian Klaver
adrian.kla...@aklaver.com mailto:adrian.kla...@aklaver.com wrote:

On 03/29/2014 08:19 AM, Willy-Bas Loos wrote:

The error that shows up is a Bus error.
That's on the replication slave.
Here's the log about it:
2014-03-29 12:41:33 CET db: ip: us: FATAL:  could not receive
data from
WAL stream: server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.

cp: cannot stat
`/data/postgresql/9.1/main/__wal_archive/__00010072000A':
No
such file or directory
2014-03-29 12:41:33 CET db: ip: us: LOG:  unexpected pageaddr
71/E9DA in log file 114, segment 10, offset 14286848
cp: cannot stat
`/data/postgresql/9.1/main/__wal_archive/__00010072000A':
No
such file or directory
2014-03-29 12:41:33 CET db: ip: us: LOG:  streaming replication
successfully connected to primary
2014-03-29 12:41:48 CET db: ip: us: LOG:  startup process (PID
17452)
was terminated by signal 7: Bus error
2014-03-29 12:41:48 CET db: ip: us: LOG:  terminating any other
active
server processes
2014-03-29 12:41:48 CET db:wbloos ip:[local] us:wbloos WARNING:
terminating connection because of crash of another server process
2014-03-29 12:41:48 CET db:wbloos ip:[local] us:wbloos DETAIL:  The
postmaster has commanded this server process to roll back the
current
transaction and exit, because another server process exited
abnormally
and possibly corrupted shared memory.
2014-03-29 12:41:48 CET db:wbloos ip:[local] us:wbloos HINT:  In a
moment you should be able to reconnect to the database and
repeat your
command.


Well what I am seeing are WAL log errors. One saying no file is
present, the other pointing at a possible file corruption.

Those are normal notices, nothing to worry about.


Well other then they cause the standby to reconnect to the primary, 
during which a crash occurs.




Shared memory problems are offered as a possible cause only. Right
now I would say we are seeing only half the picture. The Postgres
logs from the same time period for the primary server, as well as
the system logs for the openvz container would help fill in the
other half of the picture.


Here's the log from the primary postgres server:
2014-03-29 12:41:29 CET db:wbloos ip:[local] us:wbloos NOTICE:  ALTER
TABLE will create implicit sequence test_x_seq for serial column test.x
2014-03-29 12:41:33 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
LOG:  SSL renegotiation failure
2014-03-29 12:41:33 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
LOG:  SSL error: unexpected record
2014-03-29 12:41:33 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
LOG:  could not send data to client: Connection reset by peer
2014-03-29 12:41:48 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
LOG:  could not receive data from client: Connection reset by peer
2014-03-29 12:41:48 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
LOG:  unexpected EOF on standby connection

(the SSL renegotiation failure happens all the time, without the crash)

And here's the syslog form the container:
Mar 29 12:41:01 mycontainer snmpd[8819]: Connection from UDP:
[xxx.xxx.xxx.xxx]:59090-[xxx.xxx.xxx.xxx]
Mar 29 12:42:30 mycontainer snmpd[8819]: Connection from UDP:
[xxx.xxx.xxx.xxx]:35949-[xxx.xxx.xxx.xxx]

The log on the host doesn't say anything interesting either.

A cursory look at memory management in openvz shows it is different
from other virtualization software and physical machines. Whether
that is a problem would seem to be dependent on where you are on the
learning curve:)

That sounds like there is a solution to the problem, all you have to do
is find out what it is. There doesn't seem to be a variable in the
beancounters or anywhere else that can prevent the bus error from happening.
There's seems to be no separate way of guaranteeing shared memory.
There's no OOM killer active either, nor is host or server running short
of memory.


At this point I am not sure it is even obvious what is causing the 
error, so finding a solution would be a hit or miss affair at best.




I'm still worried that it's like Tom Lane said in another discussion:So
basically, you've got a broken kernel here: it claimed to give PG circa
(135MB) of memory, but what's actually there is only about (128MB). I
don't see any connection between those numbers and the shmmax/shmall
settings, either --- so I think this must be some busted implementation
of a VM-level limitation.
(here:

Re: [ADMIN][GENERAL] openvz and shared memory trouble

2014-03-31 Thread Adrian Klaver

On 03/31/2014 04:12 AM, Willy-Bas Loos wrote:


On Sat, Mar 29, 2014 at 6:17 PM, Adrian Klaver
adrian.kla...@aklaver.com mailto:adrian.kla...@aklaver.com wrote:




And it makes me wonder what else may be issues that arise from that. But
especially, what i can do about it.


Had to go through it a couple of times, and look up the acronyms, but 
the thread below seems to deal with the whole shared memory allocation 
and setting process in openvz quite well:


http://forum.openvz.org/index.php?t=msggoto=12061srch=postgresql+shared#msg_12061

In particular, where UBC is User Beancounters and the sysctls is the one 
in the container.:


UBC shmpages parameter controls all the shared memory which possible to 
allocate

either via IPC or shmem (e.g. tmpfs).

sysctls above are only for IPC SYSv5 shared memory.

combine that with, where VE is Vitual Environment(container):

check the same in VE please.
AFAICS, by default kernel sets:
#define SHMMAX 0x200
which is 32Mb.
So you have to increase it in VE /etc/sysctl.conf file or in /proc.

and there may be a path to a solution.



Cheers,

WBL

--
Quality comes from focus and clarity of purpose -- Mark Shuttleworth



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [ADMIN][GENERAL] openvz and shared memory trouble

2014-03-31 Thread Tom Lane
Adrian Klaver adrian.kla...@aklaver.com writes:
 On 03/31/2014 04:12 AM, Willy-Bas Loos wrote:
 I'm still worried that it's like Tom Lane said in another discussion:So
 basically, you've got a broken kernel here: it claimed to give PG circa
 (135MB) of memory, but what's actually there is only about (128MB). I
 don't see any connection between those numbers and the shmmax/shmall
 settings, either --- so I think this must be some busted implementation
 of a VM-level limitation.
 (here:
 http://www.postgresql.org/message-id/CAK3UJREBcyVBtr8D7vMfU=uddkjxkrpngcuy8eryb0tmfke...@mail.gmail.com)
 
 And it makes me wonder what else may be issues that arise from that. But
 especially, what i can do about it.

FWIW, I went back and re-read that message while perusing this thread,
and this time it struck me that there was a significant bit of evidence
I'd overlooked: namely, that the buffer block array is by no means the
last thing in Postgres' shared memory segment.  There are a bunch of
other shared data structures allocated after it, some of which almost
certainly had to have been touched by the startup subprocess.  The gdb
output makes it clear that the kernel stopped providing memory at
0xb6c4b000; but either it resumed doing so further on, or the whole shared
memory segment *had* been provisioned originally, and then part of it
got unmapped again while the startup process was running.

So it's still clearly a kernel bug, but it seems less likely that it is
triggered by some static limit on shared memory size.  Perhaps instead,
the kernel had been filling in pages for the shared segment on-demand,
and then when it got to some limit it refused to do so anymore and allowed
a SIGBUS to happen instead.

 I do not use openvz so I do not have a test bed to try out, but this 
 page seems to be related to your problem:
 http://openvz.org/Resource_shortage
 or if you want more detail and a link to what looks to a replacement for 
 beancounters:
 http://openvz.org/Setting_UBC_parameters

If this software's idea of resource management is to allow SIGBUS to
happen upon attempting to use memory that had been successfully granted,
then it's a piece of junk that you should get rid of ASAP.  (No, I
don't like Linux's OOM-kill solution to resource overcommit either.)

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] char array overhead

2014-03-31 Thread Rob Sargent
I'm angling toward using a very wide char(1) array.  Is the one-byte 
overhead for char(n126) applied to each element or to the array?


Re: [ADMIN][GENERAL] openvz and shared memory trouble

2014-03-31 Thread Adrian Klaver

On 03/31/2014 08:01 AM, Tom Lane wrote:

Adrian Klaver adrian.kla...@aklaver.com writes:

On 03/31/2014 04:12 AM, Willy-Bas Loos wrote:

I'm still worried that it's like Tom Lane said in another discussion:So
basically, you've got a broken kernel here: it claimed to give PG circa
(135MB) of memory, but what's actually there is only about (128MB). I
don't see any connection between those numbers and the shmmax/shmall
settings, either --- so I think this must be some busted implementation
of a VM-level limitation.
(here:
http://www.postgresql.org/message-id/CAK3UJREBcyVBtr8D7vMfU=uddkjxkrpngcuy8eryb0tmfke...@mail.gmail.com)

And it makes me wonder what else may be issues that arise from that. But
especially, what i can do about it.


FWIW, I went back and re-read that message while perusing this thread,
and this time it struck me that there was a significant bit of evidence
I'd overlooked: namely, that the buffer block array is by no means the
last thing in Postgres' shared memory segment.  There are a bunch of
other shared data structures allocated after it, some of which almost
certainly had to have been touched by the startup subprocess.  The gdb
output makes it clear that the kernel stopped providing memory at
0xb6c4b000; but either it resumed doing so further on, or the whole shared
memory segment *had* been provisioned originally, and then part of it
got unmapped again while the startup process was running.

So it's still clearly a kernel bug, but it seems less likely that it is
triggered by some static limit on shared memory size.  Perhaps instead,
the kernel had been filling in pages for the shared segment on-demand,
and then when it got to some limit it refused to do so anymore and allowed
a SIGBUS to happen instead.


I do not use openvz so I do not have a test bed to try out, but this
page seems to be related to your problem:
http://openvz.org/Resource_shortage
or if you want more detail and a link to what looks to a replacement for
beancounters:
http://openvz.org/Setting_UBC_parameters


If this software's idea of resource management is to allow SIGBUS to
happen upon attempting to use memory that had been successfully granted,
then it's a piece of junk that you should get rid of ASAP.  (No, I
don't like Linux's OOM-kill solution to resource overcommit either.)


At this point the memory allocation as a problem is as much conjecture 
as anything else, at least to me. So what is causing SIGBUS is an open 
question in my mind.





regards, tom lane





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [ADMIN][GENERAL] openvz and shared memory trouble

2014-03-31 Thread Tom Lane
Adrian Klaver adrian.kla...@aklaver.com writes:
 At this point the memory allocation as a problem is as much conjecture 
 as anything else, at least to me. So what is causing SIGBUS is an open 
 question in my mind.

Agreed, it's unproven what's causing the SIGBUS in the case at hand.
However, in the case I investigated for Josh Kupershmidt, it was provably
the fault of the kernel for failing to supply memory to back the shared
memory segment that it had previously agreed to allocate.  The fact that
both users are trying to run their standby server under OpenVZ is, shall
we say, merely suggestive.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] char array overhead

2014-03-31 Thread Steve Atkins

On Mar 31, 2014, at 8:08 AM, Rob Sargent robjsarg...@gmail.com wrote:

 I'm angling toward using a very wide char(1) array.  Is the one-byte overhead 
 for char(n126) applied to each element or to the array?

Each element, it's a variable length type.

There's probably a better way of storing your data, but if you end up really 
needing a one-byte long character type, there is char (with the quotes).

Cheers,
  Steve



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Alternative to Multi-Master Replication with 2 Data centers??

2014-03-31 Thread Paul Jungwirth
 We are load balancing 2 data centers.

Chapter 8 of Scalable Internet Architectures has a good discussion of
running master-master setups in separate data centers. I'd read that
whole chapter for some of the challenges you'll face.

 If DC1 goes down our LB is failing over to DC2.

This sounds like it will bring down both databases. In general using
the same machine for both load balancing and failover means that in
practice you have no failover, because if one box goes down doubling
the traffic will overwhelm the other one. If you want high
availability you should have a separate warm standby in each
datacenter, for four machines total. Otherwise you're just spending
lots of time and money for the appearance of failover but not the
reality. Or at least test it and make sure one failure won't cascade
to the whole system.

Good luck!

Paul



On Sat, Mar 29, 2014 at 11:35 AM, ethode jos...@ethode.com wrote:
 We are load balancing 2 data centers.

 Our current approach was using a software layer in our CMS to send data
 between data centers, but write/update frequency made this approach
 difficult and bug laden.

 Currently we're considering several options, of which Multi-master
 replication appears to be the top option.

 BOTH data centers need to be writable, otherwise we could use Master/Slave.
 If DC1 goes down our LB is failing over to DC2.  The failure causing
 failover could be DB related OR be web server related.

 It doesn't appear to be realistic to keep both DC's updated on inserts
 and/or updates without using Multi-master or some other 3rd party software
 that appear to do the same thing as Multi-master.

 Any other solutions I should be considering



 --
 View this message in context: 
 http://postgresql.1045698.n5.nabble.com/Alternative-to-Multi-Master-Replication-with-2-Data-centers-tp5797886.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] char array overhead

2014-03-31 Thread Rob Sargent

On 03/31/2014 09:48 AM, Steve Atkins wrote:

On Mar 31, 2014, at 8:08 AM, Rob Sargent robjsarg...@gmail.com wrote:


I'm angling toward using a very wide char(1) array.  Is the one-byte overhead for 
char(n126) applied to each element or to the array?

Each element, it's a variable length type.

There's probably a better way of storing your data, but if you end up really needing a 
one-byte long character type, there is char (with the quotes).

Cheers,
   Steve



Thank you! First for the confirmation on the overhead (how 
disappointing) and secondly for the char.  That had not crossed my radar.


Jsyk, I'm toying with a rather large number of small valued datapoints 
per sample.  I'm tying text and smallint as well.


Thanks again,

rjs



Re: [GENERAL] Complex query

2014-03-31 Thread Igor Neyman
 -Original Message-
 From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
 ow...@postgresql.org] On Behalf Of Leonardo M. Ramé
 Sent: Monday, March 31, 2014 2:38 PM
 To: PostgreSql-general
 Subject: [GENERAL] Complex query
 
 Hi, I'm looking for help with this query.
 
 Table Tasks:
 
 IdTask  StatusCode  StatusName
 --
 1   R   Registered
 1   S   Started
 1   D   Dictated
 1   F   Finished
 1   T   Transcribed
 --
 2   R   Registered
 2   S   Started
 2   T   Transcribed
 2   F   Finished
 
 As you can see, I have a table containing tasks and statuses. What I would 
 like
 to get is the list of tasks, including all of its steps, for only those tasks 
 where
 the StatusCode sequence was S followed by T.
 
 In this example, the query should only return task Nº 2:
 
 2   R   Registered
 2   S   Started
 2   T   Transcribed
 2   F   Finished
 
 Can anybody help me with this?.
 
 Regards,
 --
 Leonardo M. Ramé
 Medical IT - Griensu S.A.
 Av. Colón 636 - Piso 8 Of. A
 X5000EPT -- Córdoba
 Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
 Cel.: +54 9 (011) 40871877
 
 
 
 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Leonardo,

Unless you add one more column to your Tasks table, specifically: 
StatusTimestamp as in:

IdTask  StatusCode  StatusName StatusTimestamp

You cannot find which record in the table follows which, because order in which 
records returned from the database is not guaranteed until you add ORDER BY 
clause to your SELECT statement.

Regards,
Igor Neyman
 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Complex query

2014-03-31 Thread Raymond O'Donnell
On 31/03/2014 19:38, Leonardo M. Ramé wrote:
 Hi, I'm looking for help with this query. 
 
 Table Tasks:
 
 IdTask  StatusCode  StatusName
 --
 1   R   Registered
 1   S   Started
 1   D   Dictated
 1   F   Finished
 1   T   Transcribed
 --
 2   R   Registered
 2   S   Started
 2   T   Transcribed
 2   F   Finished
 
 As you can see, I have a table containing tasks and statuses. What I
 would like to get is the list of tasks, including all of its steps, for
 only those tasks where the StatusCode sequence was S followed by T.

How do you know the sequence in which the statuses occurred? Is there
another column with a timestamp or something?

Ray.


-- 
Raymond O'Donnell :: Galway :: Ireland
r...@iol.ie


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Wanted: ALTER TRIGGER ... OWNED BY EXTENSION

2014-03-31 Thread Moshe Jacobson
An extension http://pgxn.org/dist/cyanaudit I'm working on dynamically
creates trigger functions and installs them on tables in public. The
triggers are automatically created when one of the extension's config
tables is populated.

Even though I mark the trigger *functions *as owned by my extension, the
trigger definitions themselves are still dumped by pg_dump and restored by
pg_restore. This is a problem when pg_restore is using parallelism (-j),
and one thread starts issuing CREATE TRIGGER commands before the other
thread has finished populating the config table.

What ends up happening is pg_restore throws a whole lot of errors saying
that the function referenced by the CREATE TRIGGER command does not exist.
However, the function and trigger are actually created later on once the
config table is populated.

Using pg_restore without -j is a workaround, but I'd really like to be able
to mark my extension's triggers as owned by the extension, so that these
errors will not show up when restoring.

Is there a better workaround that I'm not aware of?

Thanks.

Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc. http://www.neadwerx.com
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

Quality is not an act, it is a habit. -- Aristotle


Re: [GENERAL] Complex query

2014-03-31 Thread David Johnston
Leonardo M. Ramé-2 wrote
 Hi, I'm looking for help with this query. 
 
 Table Tasks:
 
 IdTask  StatusCode  StatusName
 --
 1   R   Registered
 1   S   Started
 1   D   Dictated
 1   F   Finished
 1   T   Transcribed
 --
 2   R   Registered
 2   S   Started
 2   T   Transcribed
 2   F   Finished
 
 As you can see, I have a table containing tasks and statuses. What I
 would like to get is the list of tasks, including all of its steps, for
 only those tasks where the StatusCode sequence was S followed by T.
 
 In this example, the query should only return task Nº 2:
 
 2   R   Registered
 2   S   Started
 2   T   Transcribed
 2   F   Finished
 
 Can anybody help me with this?.

First you need to decide how tell the database that R-S-T-F is ordered and
then maybe you can use window functions, specifically lag(col, -1) over
(...), to determine what the prior row's code is and act accordingly.

Put that into a sub-query and return the IdTask to the outer query's where
clause.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Complex-query-tp5798061p5798068.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Complex query

2014-03-31 Thread Leonardo M . Ramé
On 2014-03-31 18:48:58 +, Igor Neyman wrote:
  -Original Message-
  From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
  ow...@postgresql.org] On Behalf Of Leonardo M. Ramé
  Sent: Monday, March 31, 2014 2:38 PM
  To: PostgreSql-general
  Subject: [GENERAL] Complex query
  
  Hi, I'm looking for help with this query.
  
 
 Leonardo,
 
 Unless you add one more column to your Tasks table, specifically: 
 StatusTimestamp as in:
 
 IdTask  StatusCode  StatusName StatusTimestamp
 
 You cannot find which record in the table follows which, because order in 
 which records returned from the database is not guaranteed until you add 
 ORDER BY clause to your SELECT statement.
 
 Regards,
 Igor Neyman
  
 

You are right, let's add the Id column. This is just an example, the
real table (a view) contains both, the Id and a timestamp:

Id IdTask  StatusCode  StatusName
--
1  1   R   Registered
2  1   S   Started
3  1   D   Dictated
4  1   F   Finished
5  1   T   Transcribed
--
6  2   R   Registered
7  2   S   Started
8  2   T   Transcribed
9  2   F   Finished

After adding the Id column, can I use a window function to get what I
need?.

Regards,
-- 
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] getting the current query from pg_stat_activity

2014-03-31 Thread Si Chen
Hello,

I have two different postgresql servers running slightly versions.  On one
them, if I try to use pg_stat_activity to get the current queries, I get

1$ psql
psql (9.0.13)
Type help for help.

postgres=# select procpid, query_start, waiting, current_query from
pg_stat_activity;
 procpid |  query_start  | waiting |
current_query
-+---+-+
 673 | 2014-03-31 11:45:45.38988-07  | f   | IDLE
 855 | 2014-03-31 11:45:45.478935-07 | f   | IDLE
...

This agrees with the results of
$ ps auxw | grep postgres

postgres   673  0.3  0.3 243028 55348 ?Ss   Mar30   2:25 postgres:
pguser databasename 127.0.0.1(53931) idle
postgres   855  0.3  0.3 243304 57584 ?Ss   Mar30   2:49 postgres:
pguser databasename 127.0.0.1(53981) idle

which shows that the processes are idle.

On the other one, though,
$ psql
psql (9.2.6)
Type help for help.

postgres=select pid, query_start, waiting, query from pg_stat_activity;
#   pid  |  query_start  | waiting |
query

 12333 | 2014-03-31 14:32:30.810934-04 | f   | SELECT...

 12376 | 2014-03-31 14:48:08.338419-04 | f   | COMMIT
 12405 | 2014-03-31 14:52:22.903848-04 | f   | COMMIT
 12406 | 2014-03-31 14:32:48.150378-04 | f   | SELECT   



which is strange, because the processes show they are idle,
postgres 12333  0.0  1.8 3437696 279736 ?  Ss   14:31   0:00 postgres:
opentaps databasename 127.0.0.1(37969) idle
postgres 12376  5.0  9.7 3473184 1491196 ? Ss   14:32   1:05 postgres:
opentaps databasename 127.0.0.1(38025) idle
postgres 12405  1.5  6.5 3467624 1007160 ? Ss   14:32   0:19 postgres:
opentaps databasename 127.0.0.1(38085) idle
postgres 12406  0.0  0.0 3432512 13024 ?   Ss   14:32   0:00 postgres:
opentaps databasename 127.0.0.1(38100) idle

it seems that there is also a difference between the pg_stat_activity table
of version 9.0.13:

 \d pg_stat_activity;
   View pg_catalog.pg_stat_activity
  Column  |   Type   | Modifiers
--+--+---
 datid| oid  |
 datname  | name |
 procpid  | integer  |
 usesysid | oid  |
 usename  | name |
 application_name | text |
 client_addr  | inet |
 client_port  | integer  |
 backend_start| timestamp with time zone |
 xact_start   | timestamp with time zone |
 query_start  | timestamp with time zone |
 waiting  | boolean  |
 current_query| text |


vs 9.2.6:

   View pg_catalog.pg_stat_activity
  Column  |   Type   | Modifiers
--+--+---
 datid| oid  |
 datname  | name |
 pid  | integer  |
 usesysid | oid  |
 usename  | name |
 application_name | text |
 client_addr  | inet |
 client_hostname  | text |
 client_port  | integer  |
 backend_start| timestamp with time zone |
 xact_start   | timestamp with time zone |
 query_start  | timestamp with time zone |
 state_change | timestamp with time zone |
 waiting  | boolean  |
 state| text |
 query| text |

So which one is correct?  Why does 9.0.13 show the processes as idle, and
9.2.6 show a query, even though the process shows them as idle?

-- 
Si Chen
Open Source Strategies, Inc.
sic...@opensourcestrategies.com
http://www.OpenSourceStrategies.com
LinkedIn: http://www.linkedin.com/in/opentaps
Twitter: http://twitter.com/opentaps


Re: [GENERAL] Complex query

2014-03-31 Thread Igor Neyman
 -Original Message-
 From: Leonardo M. Ramé [mailto:l.r...@griensu.com]
 Sent: Monday, March 31, 2014 2:56 PM
 To: Igor Neyman
 Cc: PostgreSql-general
 Subject: Re: [GENERAL] Complex query
 
 On 2014-03-31 18:48:58 +, Igor Neyman wrote:
   -Original Message-
   From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
   ow...@postgresql.org] On Behalf Of Leonardo M. Ramé
   Sent: Monday, March 31, 2014 2:38 PM
   To: PostgreSql-general
   Subject: [GENERAL] Complex query
  
   Hi, I'm looking for help with this query.
  
 
  Leonardo,
 
  Unless you add one more column to your Tasks table, specifically:
 StatusTimestamp as in:
 
  IdTask  StatusCode  StatusName StatusTimestamp
 
  You cannot find which record in the table follows which, because order in
 which records returned from the database is not guaranteed until you add
 ORDER BY clause to your SELECT statement.
 
  Regards,
  Igor Neyman
 
 
 
 You are right, let's add the Id column. This is just an example, the real 
 table (a
 view) contains both, the Id and a timestamp:
 
 Id IdTask  StatusCode  StatusName
 --
 1  1   R   Registered
 2  1   S   Started
 3  1   D   Dictated
 4  1   F   Finished
 5  1   T   Transcribed
 --
 6  2   R   Registered
 7  2   S   Started
 8  2   T   Transcribed
 9  2   F   Finished
 
 After adding the Id column, can I use a window function to get what I need?.
 
 Regards,
 --
 Leonardo M. Ramé
 Medical IT - Griensu S.A.
 Av. Colón 636 - Piso 8 Of. A
 X5000EPT -- Córdoba
 Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
 Cel.: +54 9 (011) 40871877

D.Johnston showed how to use windows function in this case.

Regards,
Igor Neyman


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] char array overhead

2014-03-31 Thread Alvaro Herrera
Rob Sargent wrote:

 Jsyk, I'm toying with a rather large number of small valued
 datapoints per sample.  I'm tying text and smallint as well.

You could try char[] ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] getting the current query from pg_stat_activity

2014-03-31 Thread Igor Neyman


From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Si Chen
Sent: Monday, March 31, 2014 2:57 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] getting the current query from pg_stat_activity

Hello,

I have two different postgresql servers running slightly versions.  On one 
them, if I try to use pg_stat_activity to get the current queries, I get

1$ psql
psql (9.0.13)
Type help for help.

postgres=# select procpid, query_start, waiting, current_query from 
pg_stat_activity;
 procpid |          query_start          | waiting |                            
   current_query                                
-+---+-+
     673 | 2014-03-31 11:45:45.38988-07  | f       | IDLE
     855 | 2014-03-31 11:45:45.478935-07 | f       | IDLE
...

This agrees with the results of 
$ ps auxw | grep postgres

postgres   673  0.3  0.3 243028 55348 ?        Ss   Mar30   2:25 postgres: 
pguser databasename 127.0.0.1(53931) idle
postgres   855  0.3  0.3 243304 57584 ?        Ss   Mar30   2:49 postgres: 
pguser databasename 127.0.0.1(53981) idle

which shows that the processes are idle.

On the other one, though,
$ psql
psql (9.2.6)
Type help for help.

postgres=select pid, query_start, waiting, query from pg_stat_activity;
#   pid  |          query_start          | waiting |                            
   query                                                                        
   
 12333 | 2014-03-31 14:32:30.810934-04 | f       | SELECT...                    
                                                  
 12376 | 2014-03-31 14:48:08.338419-04 | f       | COMMIT
 12405 | 2014-03-31 14:52:22.903848-04 | f       | COMMIT
 12406 | 2014-03-31 14:32:48.150378-04 | f       | SELECT                   
                                                                                
                                                                               
which is strange, because the processes show they are idle, 
postgres 12333  0.0  1.8 3437696 279736 ?      Ss   14:31   0:00 postgres: 
opentaps databasename 127.0.0.1(37969) idle
postgres 12376  5.0  9.7 3473184 1491196 ?     Ss   14:32   1:05 postgres: 
opentaps databasename 127.0.0.1(38025) idle
postgres 12405  1.5  6.5 3467624 1007160 ?     Ss   14:32   0:19 postgres: 
opentaps databasename 127.0.0.1(38085) idle
postgres 12406  0.0  0.0 3432512 13024 ?       Ss   14:32   0:00 postgres: 
opentaps databasename 127.0.0.1(38100) idle

it seems that there is also a difference between the pg_stat_activity table of 
version 9.0.13:

 \d pg_stat_activity;
           View pg_catalog.pg_stat_activity
      Column      |           Type           | Modifiers 
--+--+---
 datid            | oid                      | 
 datname          | name                     | 
 procpid          | integer                  | 
 usesysid         | oid                      | 
 usename          | name                     | 
 application_name | text                     | 
 client_addr      | inet                     | 
 client_port      | integer                  | 
 backend_start    | timestamp with time zone | 
 xact_start       | timestamp with time zone | 
 query_start      | timestamp with time zone | 
 waiting          | boolean                  | 
 current_query    | text                     | 


vs 9.2.6:

           View pg_catalog.pg_stat_activity
      Column      |           Type           | Modifiers 
--+--+---
 datid            | oid                      | 
 datname          | name                     | 
 pid              | integer                  | 
 usesysid         | oid                      | 
 usename          | name                     | 
 application_name | text                     | 
 client_addr      | inet                     | 
 client_hostname  | text                     | 
 client_port      | integer                  | 
 backend_start    | timestamp with time zone | 
 xact_start       | timestamp with time zone | 
 query_start      | timestamp with time zone | 
 state_change     | timestamp with time zone | 
 waiting          | boolean                  | 
 state            | text                     | 
 query            | text                     | 

So which one is correct?  Why does 9.0.13 show the processes as idle, and 9.2.6 
show a query, even though the process shows them as idle?

-- 
Si Chen
Open Source Strategies, Inc.
sic...@opensourcestrategies.com
http://www.OpenSourceStrategies.com
LinkedIn: http://www.linkedin.com/in/opentaps
Twitter: http://twitter.com/opentaps

First, 9.0 and 9.2 are not slightly different, there are two different major 
releases.

Second, both are right.
It's just that for IDLE processes (state column) 9.2 shows the last query 
executed before process became IDLE.
9.0 

[GENERAL] Complex query

2014-03-31 Thread Leonardo M . Ramé
Hi, I'm looking for help with this query. 

Table Tasks:

IdTask  StatusCode  StatusName
--
1   R   Registered
1   S   Started
1   D   Dictated
1   F   Finished
1   T   Transcribed
--
2   R   Registered
2   S   Started
2   T   Transcribed
2   F   Finished

As you can see, I have a table containing tasks and statuses. What I
would like to get is the list of tasks, including all of its steps, for
only those tasks where the StatusCode sequence was S followed by T.

In this example, the query should only return task Nº 2:

2   R   Registered
2   S   Started
2   T   Transcribed
2   F   Finished

Can anybody help me with this?.

Regards,
-- 
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Wanted: ALTER TRIGGER ... OWNED BY EXTENSION

2014-03-31 Thread Alvaro Herrera
Moshe Jacobson wrote:

 Even though I mark the trigger *functions *as owned by my extension, the
 trigger definitions themselves are still dumped by pg_dump and restored by
 pg_restore. This is a problem when pg_restore is using parallelism (-j),
 and one thread starts issuing CREATE TRIGGER commands before the other
 thread has finished populating the config table.

ISTM that the usual locution for this is ALTER EXTENSION .. ADD.  You
could test whether this is going to work by manually inserting rows in
pg_depend.

It seems strange to me that the created trigger is part of the extension,
however.  Maybe it's the right fix, but hmm.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Complex query

2014-03-31 Thread Leonardo M . Ramé
On 2014-03-31 11:46:28 -0700, David Johnston wrote:
 Leonardo M. Ramé-2 wrote
  Hi, I'm looking for help with this query. 
  
  Table Tasks:
  
  IdTask  StatusCode  StatusName
  --
  1   R   Registered
  1   S   Started
  1   D   Dictated
  1   F   Finished
  1   T   Transcribed
  --
  2   R   Registered
  2   S   Started
  2   T   Transcribed
  2   F   Finished
  
  As you can see, I have a table containing tasks and statuses. What I
  would like to get is the list of tasks, including all of its steps, for
  only those tasks where the StatusCode sequence was S followed by T.
  
  In this example, the query should only return task Nº 2:
  
  2   R   Registered
  2   S   Started
  2   T   Transcribed
  2   F   Finished
  
  Can anybody help me with this?.
 
 First you need to decide how tell the database that R-S-T-F is ordered and
 then maybe you can use window functions, specifically lag(col, -1) over
 (...), to determine what the prior row's code is and act accordingly.
 
 Put that into a sub-query and return the IdTask to the outer query's where
 clause.
 
 David J.
 
 
Thanks David, I hope I understood what you mean.

After adding the Id column, I came up with this query:

ris=# select lag.id, lag.idtask, lag.code, lag.lg from (select idtask, code, 
id, lag(code, -1) over () as lg  from tasks_test) as lag;
 id | idtask | code | lg 
++--+
  1 |  1 | R| S
  2 |  1 | S| D
  3 |  1 | D| F
  4 |  1 | F| T
  5 |  1 | T| R
  6 |  2 | R| S
  7 |  2 | S| T
  8 |  2 | T| F
  9 |  2 | F| 
(9 rows)

Row nº 7 meets the condition, but I don't want to show only that row, I
would like to show this:

  6 |  2 | R| S
  7 |  2 | S| T
  8 |  2 | T| F
  9 |  2 | F| 

Any hint?.

-- 
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Complex query

2014-03-31 Thread David Johnston
Leonardo M. Ramé-2 wrote
 select lag.id, lag.idtask, lag.code, lag.lg from (select idtask, code, id,
 lag(code, -1) over () as lg  from tasks_test) as lag

First you want to include an ORDER BY in the OVER(...) clause, and probably
a PARTITION BY as well.

Then you move that to a sub-query (for example):

SELECT * 
FROM tbl
WHERE tbl.idtask IN (
SELECT lag.idtask FROM ( lag_query_here ) lag WHERE lag.code = 'T' and
lag.lg = 'S'
);

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Complex-query-tp5798061p5798087.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [ADMIN][GENERAL] openvz and shared memory trouble

2014-03-31 Thread Adrian Klaver

On 03/31/2014 08:28 AM, Tom Lane wrote:

Adrian Klaver adrian.kla...@aklaver.com writes:

At this point the memory allocation as a problem is as much conjecture
as anything else, at least to me. So what is causing SIGBUS is an open
question in my mind.


Agreed, it's unproven what's causing the SIGBUS in the case at hand.
However, in the case I investigated for Josh Kupershmidt, it was provably
the fault of the kernel for failing to supply memory to back the shared
memory segment that it had previously agreed to allocate.  The fact that
both users are trying to run their standby server under OpenVZ is, shall
we say, merely suggestive.


Yes, the fact that the problem seems to go away with a change in 
shared_buffers seems to point in that direction also. To get through the 
smoke to the fire, there are more questions to be answered:


1) The OP says the shared_buffered setting that failed was 4GB and that 
'tuning' down the setting solved the problem. So it would be nice to 
know what setting worked?


2) In hand with 1), the memory settings for the virtualization host and 
the container(s) involved are?


3) What is the host OS?

4) What version of OpenVZ?

5) What is the layout?
Are the primary and standby on the same container, different containers 
or some other arrangement?
In other words could there be some other resource conflict going on, say 
network?


6) Are the crashes random or do they follow some pattern?



regards, tom lane




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Complex query

2014-03-31 Thread Leonardo M . Ramé
On 2014-03-31 12:16:53 -0700, David Johnston wrote:
 Leonardo M. Ramé-2 wrote
  select lag.id, lag.idtask, lag.code, lag.lg from (select idtask, code, id,
  lag(code, -1) over () as lg  from tasks_test) as lag
 
 First you want to include an ORDER BY in the OVER(...) clause, and probably
 a PARTITION BY as well.
 
 Then you move that to a sub-query (for example):
 
 SELECT * 
 FROM tbl
 WHERE tbl.idtask IN (
 SELECT lag.idtask FROM ( lag_query_here ) lag WHERE lag.code = 'T' and
 lag.lg = 'S'
 );
 
 David J.
 

Great!, that's what I needed, thank you.

-- 
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Getting sequence-generated IDs from multiple row insert

2014-03-31 Thread Ben Hoyt
Hi folks,

I've just run into a subtle but fairly serious race condition while using
web.py's SQL library to insert multiple rows into the database and return
their IDs (a serial primary key column). Specifically I'm using the
multiple_insert() function that web.py defines here:

https://github.com/webpy/webpy/blob/master/web/db.py#L793

This function runs a query like this:

INSERT INTO table (col1, col2) VALUES (col1_1, col2_1), (col1_2,
col2_2), ...; SELECT currval('table_id_seq');

Using the output of the currval(), web.py tries to build a list of the most
recent IDs by creating a range from currval - num_rows_inserted + 1
through currval. In Python:

out = range(out-len(values)+1, out+1)

This *looks* nice, and must have seemed fine to the developers who
implemented it, but I've just hit a case where two sessions each doing a
multiple insert don't use sequential IDs. For example, the range code above
for the first insert gave 2117552...2117829. And the second insert gave
2117625...2117818. Which are obviously overlapping and is a nasty bug
waiting to happen. Thankfully it caused an IntegrityError further down in
my code so I didn't screw things up.

First of all, I presume this is expected, and is how the sequence with a
multi-row insert is supposed to work? In other words, the sequence
guarantees the IDs will be unique, but with multi-row insert, they won't
necessarily be consecutive? If so, it's a fairly serious bug in web.py's
multiple_insert(), which probably shouldn't return anything due to this
issue.

Second, what's the right thing to do here? The first thing I found was
PostgreSQL's RETURNING clause, but somewhat frustratingly for this use
case, even that's not guaranteed to return the results in the order you
specified. I need the IDs in insertion order so I can do further
processing. Tom Lane and others in this thread indicate that this is not a
guarantee of the RETURNING clause, for future optimization reasons and due
to how SQL handles sets:

http://postgresql.1045698.n5.nabble.com/PATCH-Enforce-that-INSERT-RETURNING-preserves-the-order-of-multi-rows-td5728579.html

So currently I've changed my code to use RETURNING and then I'm ordering
the results based on a secondary column that I know the order of. This
works, but seems clunky, so I'm wondering if there's a nicer way.

Thanks,
Ben


Re: [GENERAL] Why does checkpointer is consumig ~1.2Gb of RAM?

2014-03-31 Thread Edson Richter

  
  
Em 31/03/2014 00:38, Scott Marlowe
  escreveu:


  

  On Sun, Mar 30, 2014 at 8:43 PM,
Edson Richter edsonrich...@hotmail.com
wrote:

   I'm curious about
the "checkpointer" process and its configuration.
What are the configuration options that affects the
checkpointer process?
Currently, under load, this process goes up to about
1.2GB of RAM:




-- 
  
  
  

  


  
  No it really doesn't. VIRT is every
thing it touches whether it uses it individually or with
other processes. RES is what THIS process is using all by
itself. SHR is what it's accessing of shared memory. Here's
a short explanation of what those three values mean: http://linuxpoison.blogspot.com/2009/10/what-is-difference-among-virt-res-and.html

  
  Also for a more technical one run
"man top" and search for RES, SHR, and VIRT
  

  


Yes, I understand that. That's why I've asked: why is checkpointer
process consuming 1.2Gb of RAM (1215M RES more precisely), and which
parameter affects its memory consumption?

Thanks,

Edson

-- 
  
  

  

Edson Carlos Ericksson Richter
  
  Sistemas para Bibliotecas, Escolinhas Infantis
Projetos sob medida para sua empresa

  
  
Celular:
(51) 9318-9766
(51) 8585-0796
  
  

  "A mente que se abre a uma nova ideia jamais voltar ao
seu tamanho original"
  - Albert Einstein

  

  
  

  



Re: [GENERAL] Getting sequence-generated IDs from multiple row insert

2014-03-31 Thread Andrew Sullivan
On Mon, Mar 31, 2014 at 03:28:14PM -0400, Ben Hoyt wrote:

 , but I've just hit a case where two sessions each doing a
 multiple insert don't use sequential IDs. For example, the range code above
 for the first insert gave 2117552...2117829. And the second insert gave
 2117625...2117818. Which are obviously overlapping and is a nasty bug
 waiting to happen. Thankfully it caused an IntegrityError further down in
 my code so I didn't screw things up.

Good thing you caught it.  But yes, just from the description it
looked like an obvious race to me.  Concurrency is hard.

 Second, what's the right thing to do here? The first thing I found was
 PostgreSQL's RETURNING clause, but somewhat frustratingly for this use
 case, even that's not guaranteed to return the results in the order you
 specified. 

In SQL, _nothing_ is guaranteed to return in the order you specified.
This isn't really a Postgres thing; unless you use ORDER BY, SQL's
sets are not ordered.

 I need the IDs in insertion order so I can do further processing.

This sets off alarm bells for me.  What further processing are you
doing?  Is it possible that you could move that into a single step in
the database (maybe with a function or even a trigger) so that the
result of your RETURNING really would provide you with what you need?

 So currently I've changed my code to use RETURNING and then I'm ordering
 the results based on a secondary column that I know the order of. This
 works, but seems clunky, so I'm wondering if there's a nicer way.

This is probably what I'd do, assuming that further processing isn't
more data transformation.  If it _is_, then I'd do the whole thing in
a single step (in the database, once I inserted).

A

-- 
Andrew Sullivan
a...@crankycanuck.ca


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] getting the current query from pg_stat_activity

2014-03-31 Thread David Johnston
Si Chen-2 wrote
 I have two different postgresql servers running slightly [different]
 versions. 

Versions 9.0 and 9.2 are NOT slightly different.  These are two MAJOR
RELEASES (which allow for API changes) apart (i.e., one major release in
between - 9.1)

The release notes for 9.2 note this particular change explicitly:

http://www.postgresql.org/docs/9.2/interactive/release-9-2.html

Section E.9.2.6

Note the presence of the state column in the 9.2 schema - you use this to
determine if a connection is idle instead of looking for IDLE in a
query column which then allows the query column to be report the last known
query at all times.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/getting-the-current-query-from-pg-stat-activity-tp5798076p5798098.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does checkpointer is consumig ~1.2Gb of RAM?

2014-03-31 Thread Scott Marlowe
It's not. If the RES shows 1215M And the SHR is 1214M, then checkpointer is
only using 1M. The difference between the two is the shared memory. RES
includes SHR.


On Mon, Mar 31, 2014 at 1:44 PM, Edson Richter edsonrich...@hotmail.comwrote:

  Em 31/03/2014 00:38, Scott Marlowe escreveu:


 On Sun, Mar 30, 2014 at 8:43 PM, Edson Richter 
 edsonrich...@hotmail.comwrote:

  I'm curious about the checkpointer process and its configuration.
 What are the configuration options that affects the checkpointer process?
 Currently, under load, this process goes up to about 1.2GB of RAM:




 --


  No it really doesn't. VIRT is every thing it touches whether it uses it
 individually or with other processes. RES is what THIS process is using all
 by itself. SHR is what it's accessing of shared memory. Here's a short
 explanation of what those three values mean:
 http://linuxpoison.blogspot.com/2009/10/what-is-difference-among-virt-res-and.html

  Also for a more technical one run man top and search for RES, SHR, and
 VIRT


 Yes, I understand that. That's why I've asked: why is checkpointer process
 consuming 1.2Gb of RAM (1215M RES more precisely), and which parameter
 affects its memory consumption?

 Thanks,

 Edson


 --

   *Edson Carlos Ericksson Richter*

 *Sistemas para Bibliotecas, Escolinhas Infantis Projetos sob medida para
 sua empresa*  Celular: (51) 9318-9766
 (51) 8585-0796  *A mente que se abre a uma nova ideia jamais voltará ao
 seu tamanho original*
 - Albert Einstein




-- 
To understand recursion, one must first understand recursion.
inline: image/png

Re: [GENERAL] Wanted: ALTER TRIGGER ... OWNED BY EXTENSION

2014-03-31 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Moshe Jacobson wrote:
 Even though I mark the trigger *functions *as owned by my extension, the
 trigger definitions themselves are still dumped by pg_dump and restored by
 pg_restore. This is a problem when pg_restore is using parallelism (-j),
 and one thread starts issuing CREATE TRIGGER commands before the other
 thread has finished populating the config table.

 ISTM that the usual locution for this is ALTER EXTENSION .. ADD.  You
 could test whether this is going to work by manually inserting rows in
 pg_depend.

 It seems strange to me that the created trigger is part of the extension,
 however.  Maybe it's the right fix, but hmm.

I'm suspicious that the problem is exactly lack of pg_depend records ---
pg_dump/pg_restore relies on those for correct ordering of parallel
operations.  What method are you using to create these triggers (not
the functions, the per-table pg_trigger records)?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] getting the current query from pg_stat_activity

2014-03-31 Thread Si Chen
Thanks!  That's very helpful and answers my question.


On Mon, Mar 31, 2014 at 12:52 PM, David Johnston pol...@yahoo.com wrote:

 Si Chen-2 wrote
  I have two different postgresql servers running slightly [different]
  versions.

 Versions 9.0 and 9.2 are NOT slightly different.  These are two MAJOR
 RELEASES (which allow for API changes) apart (i.e., one major release in
 between - 9.1)

 The release notes for 9.2 note this particular change explicitly:

 http://www.postgresql.org/docs/9.2/interactive/release-9-2.html

 Section E.9.2.6

 Note the presence of the state column in the 9.2 schema - you use this to
 determine if a connection is idle instead of looking for IDLE in a
 query column which then allows the query column to be report the last
 known
 query at all times.

 David J.






 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/getting-the-current-query-from-pg-stat-activity-tp5798076p5798098.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




-- 
Si Chen
Open Source Strategies, Inc.
sic...@opensourcestrategies.com
http://www.OpenSourceStrategies.com
LinkedIn: http://www.linkedin.com/in/opentaps
Twitter: http://twitter.com/opentaps


Re: [GENERAL] Wanted: ALTER TRIGGER ... OWNED BY EXTENSION

2014-03-31 Thread Moshe Jacobson
On Mon, Mar 31, 2014 at 4:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I'm suspicious that the problem is exactly lack of pg_depend records ---
 pg_dump/pg_restore relies on those for correct ordering of parallel
 operations.  What method are you using to create these triggers (not
 the functions, the per-table pg_trigger records)?


There is a trigger function on the extension's config table that creates
the trigger functions and installs them (both from the same function). I am
about to try playing with pg_depend to see if it has the desired effect.


Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc. http://www.neadwerx.com
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

Quality is not an act, it is a habit. -- Aristotle


Re: [GENERAL] Getting sequence-generated IDs from multiple row insert

2014-03-31 Thread David Johnston
Andrew Sullivan-8 wrote
 So currently I've changed my code to use RETURNING and then I'm ordering
 the results based on a secondary column that I know the order of. This
 works, but seems clunky, so I'm wondering if there's a nicer way.
 
 This is probably what I'd do, assuming that further processing isn't
 more data transformation.  If it _is_, then I'd do the whole thing in
 a single step (in the database, once I inserted).

If order is an implicit property of the source data then you need to
explicitly encode that order during (or before) import.  There are numerous
ways to implement such but except for extremely simple cases PostgreSQL will
not do the appropriate thing automatically in the face of concurrency.

Also, do you need sequential IDs or just IDs that are ever increasing?  And
if the later then tagging the input source will let you distinguish between
two different datasets even if their sequences are overlapping.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Getting-sequence-generated-IDs-from-multiple-row-insert-tp5798092p5798107.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Wanted: ALTER TRIGGER ... OWNED BY EXTENSION

2014-03-31 Thread Moshe Jacobson
On Mon, Mar 31, 2014 at 4:32 PM, Moshe Jacobson mo...@neadwerx.com wrote:

 There is a trigger function on the extension's config table that creates
 the trigger functions and installs them (both from the same function). I am
 about to try playing with pg_depend to see if it has the desired effect.


I've added the requisite rows to pg_depend, and I know it was correct
because \dx+ cyanaudit (my extension) now shows all of the triggers.

However, pg_dump still dumps them, and pg_restore still restores them,
causing the same errors as I had before.


Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc. http://www.neadwerx.com
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

Quality is not an act, it is a habit. -- Aristotle


Re: [GENERAL] Why does checkpointer is consumig ~1.2Gb of RAM?

2014-03-31 Thread Edson Richter

  
  
Em 31/03/2014 17:06, Scott Marlowe
  escreveu:


  It's not. If the RES shows 1215M And the SHR is
1214M, then checkpointer is only using 1M. The difference
between the two is the shared memory. RES includes SHR.
  


Thanks, now I understand.

Regards,

Edson



  

On Mon, Mar 31, 2014 at 1:44 PM, Edson
  Richter edsonrich...@hotmail.com
  wrote:
  

  Em 31/03/2014 00:38, Scott Marlowe escreveu:
  
  

  

  
On Sun, Mar 30, 2014 at
  8:43 PM, Edson Richter edsonrich...@hotmail.com
  wrote:
  
 I'm
  curious about the "checkpointer" process
  and its configuration.
  What are the configuration options that
  affects the checkpointer process?
  Currently, under load, this process goes
  up to about 1.2GB of RAM:
  
  
  
  
  -- 



  

  
  

No it really doesn't.
  VIRT is every thing it touches whether it uses
  it individually or with other processes. RES
  is what THIS process is using all by itself.
  SHR is what it's accessing of shared memory.
  Here's a short explanation of what those three
  values mean: http://linuxpoison.blogspot.com/2009/10/what-is-difference-among-virt-res-and.html
  

Also for a more
  technical one run "man top" and search for
  RES, SHR, and VIRT

  

  
  

  
  Yes, I understand that. That's why I've asked: why is
  checkpointer process consuming 1.2Gb of RAM (1215M RES
  more precisely), and which parameter affects its memory
  consumption?
  
  Thanks,
  
  Edson
  

-- 
  
  

  
 Edson Carlos Ericksson Richter
  
  Sistemas para Bibliotecas, Escolinhas
Infantis
Projetos sob medida para sua empresa 
  
  
Celular:
(51)
  9318-9766
(51)
  8585-0796
  
  
 "A mente que se abre a uma
nova ideia jamais voltar ao seu tamanho
original"
  - Albert Einstein 
  

  
  

  

  




-- 
To understand recursion, one must first understand recursion.
  



-- 
  
  

  

Edson Carlos Ericksson Richter
  
  Sistemas para Bibliotecas, Escolinhas Infantis
Projetos sob medida para sua empresa

  
  
Celular:
(51) 9318-9766
(51) 8585-0796
  
  

  "A mente que se abre a uma nova ideia jamais voltar ao
seu tamanho original"
  - Albert Einstein

  

  
  

  



Re: [GENERAL] Why does checkpointer is consumig ~1.2Gb of RAM?

2014-03-31 Thread Scott Marlowe
And yet I still got it wrong. Bad day for me. Should be:

The difference between the two is the memory it's using.

Bad day apparently.

On Mon, Mar 31, 2014 at 2:46 PM, Edson Richter edsonrich...@hotmail.comwrote:

  Em 31/03/2014 17:06, Scott Marlowe escreveu:

 It's not. If the RES shows 1215M And the SHR is 1214M, then checkpointer
 is only using 1M. The difference between the two is the shared memory. RES
 includes SHR.


 Thanks, now I understand.

 Regards,

 Edson





 On Mon, Mar 31, 2014 at 1:44 PM, Edson Richter 
 edsonrich...@hotmail.comwrote:

  Em 31/03/2014 00:38, Scott Marlowe escreveu:


 On Sun, Mar 30, 2014 at 8:43 PM, Edson Richter 
 edsonrich...@hotmail.comwrote:

  I'm curious about the checkpointer process and its configuration.
 What are the configuration options that affects the checkpointer process?
 Currently, under load, this process goes up to about 1.2GB of RAM:




 --


  No it really doesn't. VIRT is every thing it touches whether it uses it
 individually or with other processes. RES is what THIS process is using all
 by itself. SHR is what it's accessing of shared memory. Here's a short
 explanation of what those three values mean:
 http://linuxpoison.blogspot.com/2009/10/what-is-difference-among-virt-res-and.html

  Also for a more technical one run man top and search for RES, SHR,
 and VIRT


  Yes, I understand that. That's why I've asked: why is checkpointer
 process consuming 1.2Gb of RAM (1215M RES more precisely), and which
 parameter affects its memory consumption?

 Thanks,

 Edson


 --

   *Edson Carlos Ericksson Richter*

 *Sistemas para Bibliotecas, Escolinhas Infantis Projetos sob medida para
 sua empresa*  Celular: (51) 9318-9766 %2851%29%209318-9766
 (51) 8585-0796 %2851%29%208585-0796  *A mente que se abre a uma nova
 ideia jamais voltará ao seu tamanho original*
 - Albert Einstein




 --
 To understand recursion, one must first understand recursion.



 --

   *Edson Carlos Ericksson Richter*

 *Sistemas para Bibliotecas, Escolinhas Infantis Projetos sob medida para
 sua empresa*  Celular: (51) 9318-9766
 (51) 8585-0796  *A mente que se abre a uma nova ideia jamais voltará ao
 seu tamanho original*
 - Albert Einstein




-- 
To understand recursion, one must first understand recursion.
inline: image/png

Re: [GENERAL] Getting sequence-generated IDs from multiple row insert

2014-03-31 Thread Andrew Sullivan
On Mon, Mar 31, 2014 at 01:34:04PM -0700, David Johnston wrote:
 
 If order is an implicit property of the source data then you need to
 explicitly encode that order during (or before) import. 

Sure, but the problem the OP had I thought was that the RETURNING
clause doesn't guarantee that the rows coming back are in the order
they were inserted.  This is just a SQL thing.  (I guess you could
ORDER BY the RETURNING clause, right?)

 There are numerous
 ways to implement such but except for extremely simple cases PostgreSQL will
 not do the appropriate thing automatically in the face of concurrency.

It _is_ doing the appropriate thing, though: this is SQL.  The rows
aren't ordered unless you tell them to be.

A

-- 
Andrew Sullivan
a...@crankycanuck.ca


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Wanted: ALTER TRIGGER ... OWNED BY EXTENSION

2014-03-31 Thread Tom Lane
Moshe Jacobson mo...@neadwerx.com writes:
 I've added the requisite rows to pg_depend, and I know it was correct
 because \dx+ cyanaudit (my extension) now shows all of the triggers.

 However, pg_dump still dumps them, and pg_restore still restores them,
 causing the same errors as I had before.

I don't think pg_dump believes that pg_trigger rows can belong to an
extension; and I'm dubious of the concept too.  What I was wondering about
was whether those rows had proper dependencies on (a) the functions and
(b) their owning tables.

Basically what you need to end up with is

* trigger function has a membership dependency on the extension

* pg_trigger row has a normal dependency on the trigger function it uses

* pg_trigger row has an auto dependency on the table it's for

If you're using SQL commands to create the trigger then I'd expect the
latter two to be handled automatically; but it sorta sounds like you're
doing something pretty low-level and perhaps omitting these steps.

Note: depending on what it is you're trying to accomplish, it might be
saner for the pg_trigger rows to have auto dependencies on their trigger
functions.  Depends whether you'd like DROP EXTENSION to complain or just
shut up and drop the triggers.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Wanted: ALTER TRIGGER ... OWNED BY EXTENSION

2014-03-31 Thread Moshe Jacobson
On Mon, Mar 31, 2014 at 5:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Basically what you need to end up with is

 * trigger function has a membership dependency on the extension


Yes, the dependency is set up when the trigger function is dynamically
created by using ALTER EXTENSION ... ADD FUNCTION

* pg_trigger row has a normal dependency on the trigger function it uses


Aha, this is not present. I think it is due to some migration magic I did a
while ago.
Is there a way to clean up the dependencies, or at least list out the
dependencies that seem suspicious?

* pg_trigger row has an auto dependency on the table it's for


This is present.

Note: depending on what it is you're trying to accomplish, it might be
 saner for the pg_trigger rows to have auto dependencies on their trigger
 functions.  Depends whether you'd like DROP EXTENSION to complain or just
 shut up and drop the triggers.


I would love for DROP EXTENSION to shut up and drop the triggers, but I'm
not sure how to accomplish that without manually manipulating pg_depend
(and even then I'm not sure if that'll do it). Suggestions welcome.

Thanks.

Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc. http://www.neadwerx.com
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

Quality is not an act, it is a habit. -- Aristotle


Re: [GENERAL] Postgres as In-Memory Database?

2014-03-31 Thread Jeff Janes
On Sunday, March 30, 2014, Stefan Keller sfkel...@gmail.com wrote:

 Hi Jeff

 2013/11/20 Jeff Janes 
 jeff.ja...@gmail.comjavascript:_e(%7B%7D,'cvml','jeff.ja...@gmail.com');
 


 I don't know what you mean about enhancements in the buffer pool.  For an
 in-memory database, there shouldn't be a buffer pool in the first place, as
 it is *all* in memory.


 You are right: In-memory DBs are making buffer-pooling obsolete - except
 for making data persistent (see below).



I would be very reluctant to use any database engine which considered disk
access obsolete.  There will always be a risk where data grows to exceed
RAM, and where it would be inconvenient to expand RAM fast enough to
accommodate it.  I've played those games enough with Perl and C in-memory
systems.  You fight and squeeze to fit the data into RAM, then the data
size grows 3% and all of our work is for naught.  You can buy more RAM, if
you have the budget, and the RAM isn't back-ordered for 3 months because
the factory that makes it had a fire, and if more RAM fits on your
motherboard, and


 
  Do you know why it is slow?  I'd give high odds that it would be a
 specific implementation detail in
  the code that is suboptimal, or maybe a design decision of PostGIS,
 rather than some high level
  architectural decision of PostgreSQL.

 Referring to the application is something you can always say - but
 shouldn't prevent on enhancing Postgres.


 Postgres has been enhanced.  Now we need to change osm2pgsql to take
 advantage of them.  It defines indexes on the tables that are going to be
 bulk loaded with COPY, which defeats some recent optimizations made to
 COPY.  The creation of the indexes should be delayed until after the bulk
 load is done.

 A further enhancement to Postgres would be would be to automatically defer
 creation of the indexes when a table is truncated or created within a
 transaction, so that users get the benefit of the improvement

 These enhancements to osm2pgsql seem to be reasonable to me. I hope
 somebody has time to care about.


I have a fork of osm2pgsql on github which delays the index build until the
COPY is done.  I'm not really motivated to convince anyone to merge it (as
my interest is postgresql not osm itself), but if someone wants to pick it
up, that is fine with me.  It helps somewhat, but it is not a game-changer
because there are other bigger bottlenecks, at least for HDD based systems.

One of the bigger bottlenecks is building the GIN indexes on the way table
at the end.  Setting maintenance_work_mem to huge values helps a lot, if
you can find a safe setting for it considering multiple index builds it
might be doing (at that point in the load, osm2pgsql's node cache has been
released, so there is substantial RAM to re-purpose).  It would be better
for this use if PostgreSQL built the index by using an external sort,
rather than iterating over the table building maintenance_work_mem sized
chunks of red-black trees.  The problem there is that osm uses the gin
index in an odd way (the vast majority of nodes occur in exactly one way,
with a minority occurring in more than one), and using a disk sort might
not be ideal for the more common use cases where GIN is used, where a given
token usually occurs in far more than one document.  So an improvement that
only improves osm2pgsql and degrades other uses is unlikely to be adopted.

Another bottleneck is just the raw COPY into the node table.  When that is
running against an unindexed table which was created in the same
transaction, I see that osm2pgsql takes about 50% of a CPU to print a
copy-stream, and postgresql uses about 50% of a CPU to parse that stream
and insert into the table.  So they add up to about 1 CPU despite the fact
this a is multiple CPU machine.  So they seem to be playing ping-pong with
the pipe buffer when in theory they should each by able to run at almost
full speed.  I don't know how to get it stop playing ping-pong, but I have
other use cases where this shows up, so trade-off-free solution would be
nifty.  I suspect that that is more of a kernel issue than either
postgresql or osm2pgsql.

You could do the COPY in parallel in multiple threads, but the problem
there is you can't use the created in same transaction optimization to
avoid WAL overhead.  There is no fix to this without changing PostgreSQL to
accommodate it, but i have no clear idea how one would do that.  Importing
a snapshot doesn't seem like it would be enough, as you can only import
snapshots for reading, not for writing.

Also, I'm not too sure how seriously to take the goal of optimizing
osm2pgsql.  Development on it seems to be less than vigorous.  And its
purpose is to create a database to be used, so wouldn't it make more sense
to optimize the use, not the creation?  And if you do want to optimize the
creation, the obvious way to do it so to create the export in a way more
closely aligned to that need, rather than a generic export.


 In the meantime I 

Re: [GENERAL] Doubts on startup costs

2014-03-31 Thread Rajeev rastogi
Start-up cost is the cost required to fetch the first tuple. So yes it is 
possible for the startup cost of innermost node/leaf node to be zero as show in 
below example:.

postgres=# explain select * from tbl,tbl2  where tbl2.id=tbl.id order by tbl.id;
QUERY PLAN
--
Merge Join  (cost=809.81..1636.50 rows=11571 width=8)
   Merge Cond: (tbl.id = tbl2.id)
   -  Index Only Scan using idx on tbl  (cost=0.42..3369.01 rows=11 
width=4)
   -  Sort  (cost=809.39..834.39 rows=1 width=4)
 Sort Key: tbl2.id
 -  Seq Scan on tbl2  (cost=0.00..145.00 rows=1 width=4)
Planning time: 0.672 ms

postgres=# explain select * from tbl order by id;
 QUERY PLAN

Index Only Scan using idx on tbl  (cost=0.42..3369.01 rows=11 width=4)
Planning time: 0.305 ms

Also start-up cost of outer node need not be more than total cost of inner 
nodes. If it is possible for outer nodes to
emit one tuple without waiting for complete operation to happen by inner nodes, 
then outer node start-up cost will be
much lesser than total cost by inner nodes. But start-up cost of outer node 
cannot be less the start-up cost of inner nodes.

For example in above example plan, a merge join can emit one tuple as soon as 
it finds one matching row.
So start-up cost is lesser.

Thanks and Regards,
Kumar Rajeev Rastogi
--
This e-mail and its attachments contain confidential information from HUAWEI, 
which
is intended only for the person or entity whose address is listed above. Any 
use of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender by
phone or email immediately and delete it!

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Jayadevan M
Sent: 30 March 2014 20:45
To: pgsql-general@postgresql.org
Subject: [GENERAL] Doubts on startup costs

Hi,
A few academic questions related PostgreSQL query planner and output -

In the output of EXPLAIN for SQL statements I have seen so far, the startup 
cost for the innermost node/leaf has been 0. Are there situations where it may 
be non-zero?
The startup costs for outer nodes will always be equal to or greater than the 
total cost of the inner nodes? (My guess is NO, there may be cases where the 
outer node can start processing before the inner node is completely done).
Regards,
Jayadevan