Re: [GENERAL] Trust intermediate CA for client certificates
On 03/07/2013 12:42 PM, Ray Stell wrote: What Tom said works for me. Here is a page that gives an example and I think it demonstrates that the root CA does not allow everybody in the gate, the chain has to be in place: http://stackoverflow.com/questions/1456034/trouble-understanding-ssl-certificate-chain-verification That page doesn't even mention PostgreSQL. You can use the openssl verify command to test that the root is not wide open on it's own. The issue is the behavior of the PostgreSQL server. openssl verify is germane only in that it points to the source of the problem -- OpenSSL's insistence on ultimately validating all certificates against a self- signed root CA. This requires that the root CA certificate be present in root.crt, which causes the server to accept connections from all clients that can present a certificate chain leading to that root CA. If you don't believe me, test with the attached files, which implement the following hierarchy. +-+ | Root CA | +-+ /\ / \ /\ / \ /\ / \ /\ / \ +---++---+ | Server CA || Client CA | +---++---+ /\\ / \\ /\\ / \\ /\\ / \\ /\\ / \\ +--+ ++ ++ | postgres | | Bad | | Good | | (server) | | client | | client | +--+ ++ ++ The goal is to configure the server such that the good client will be allowed to connect (because its certificate is signed by the Client CA), but the bad client will not be allowed to connect (because its certificate is not signed by the Client CA). You will find the following: 1. You cannot simply use client-ca,crt as $PGDATA/root.crt. OpenSSL will not validate a client certificate without access to the root CA certificate. 2. To enable client connections, you must add the root CA certificate to $PGDATA/root.crt -- cat client-ca.crt root-ca.crt root.crt. 3. Once the root CA certificate is trusted, however, the bad client can also connect by using a certificate chain that includes the Server CA certificate --cat bad-client.crt server-ca.crt ~/.postgresql/postgresql.crt. After looking at be-secure.c and investigating the way that OpenSSL validates certificates, I do not believe that there is any way of achieving the desired behavior with the current codebase. Adding pgsql-hackers to see if there is any interest in a patch to add this functionality. -- Ian Pilcher arequip...@gmail.com Sometimes there's nothing left to do but crash and burn...or die trying. root-ca.crt Description: application/pkix-cert server-ca.crt Description: application/pkix-cert client-ca.crt Description: application/pkix-cert bad-client.crt Description: application/pkix-cert -BEGIN RSA PRIVATE KEY- MIIEpAIBAAKCAQEA5rerLO8F7DxFxuOXFmJT/YDQDmQtaMPxMQs1fufUiFIqgCyA oBMCTghSQJPpm5dPP4385ZKBys2noqIMz2zr3JFQaTU8mO5wAHBuMjCPKbzqap/o YAQTejaDYW/vh9Kz3wWLgabcaQl0xl9ZFjBUAV9sEh9P3drG58k+f5Glnf6OgDSK P4SBZm6nT2tP4XL7T1NFHtqUMQn6TLkpLYGUxg3zgb4j69la6ItPjVEKuMkAW5zl biLDYc5uuqtNUJHGeRmd03rWeMwamJClH7bUwEuuq7e4wn5mufzZR63uYrNgrG+h 1k/ar+7roc02icQh2rPdU4nxIJrs+c4l4HdwFQIDAQABAoIBAHO/a3pEhFUrO9p3 LcKGHBsPN9IwgfOQcf2n4PPE/QRTLI1XRkSIpNxfIlzRmB59/70jz9+g68rB+DsI T6L0wzPKF2xg0ADthnVB8pbtc7V92KEbjmo1QUxL8we8L5CVrbXSw1WNUADGRLaM +VW/czWpGL/Sw6/K5YU9mkRH3q3vJqZbSYyPxO42mZUJcH1wk/vIfYtzcR9tMPkW Ln9zHZvDT9uBTTzWrPqr+R8QPsdW01AW2+D4OrQF5XB09N5fNGB9oUIxFdr3p+Lw XeqMjNEafGtzYLHITMrIb0btvXJ8VdAUgT4Rm5QU0aATwP5iC0cCxCnefN0Q1qKO MiU1OvUCgYEA/L1EZiT7yqa/+UEm9z2w7m//FOSzS+1tV6HazHhIr9xQ9C3AK5W5 s2YARG+k1WIvgLBpF2Bp4SAvMIBFSY0TpD+f3wek/lhO1OrrETXtZkdhFTi90qox 65wgybPkITXXLJoYNpmUKgTy7GosoSI8E4VO+inusj+fsyk+7RkiddcCgYEA6bGr PgpYHehCblY/4XVaKB76C73pDRUyQh4KLYrcZwEq1tV5XdfRW8WHDP349xdpFli1 ABSuoc0DKqarVQBjLhnJNmJ8g3/FnzrMqk6JIYBbinYZmFFs888H92IK5/ctMT32 9D7GOD4JrfBRqbOn91c5Nq/ne015vLREGIZ+c/MCgYAr2f8DJgmWCMaoTbigD1Ei ncYJbwD4/JILMWcQMRKTiMt3AnUkWs8kpF8JgMF90JJjZrhlOPJGAFqPtMHQ2Cx/ RBbOELp88v+Ci9wLWWr+YwYiM30kDymoMqext4euh3P1JitrVcxSWhd4E5f4wULh NDEW0K28ubNQ16g2ZTUIcwKBgQCOs3pA2SozoQcnvy0k7HcQNtIzZ1UvMvlMnHFU nA24LGNPam3BGy9xna25BkEICViXV7W3BeoZTUoYuku3DRSDKyXOOteTqOsxL0OY
[GENERAL] postgres 9.0.2 replicated database is crashing
Hi all - I am in desperate need of your help. The replication/streaming stopped working on March5th. I followed the following procedure to restart the streaming. After running it for couple of hours , the database is crashing on the slave. This is on our production server. Thanks for your help. Steps that I followed 1. SELECT pg_start_backup('label', true); 2. I created tar of data. 3. un tar the data on the slave. 4. deleted the pid file on the slave 5. created recovery.conf 5. SELECT pg_stop_backup(); on the primary 6. started the postgres on slave 7. After 2 hours, the slave server crashes Please help. Regards
Re: [GENERAL] Replication stopped on 9.0.2 after making change to conf file
It sounds like all you did was setup the slave from scratch with a fresh base backup, without understanding or debugging what caused everything to break. Clearly whatever was wrong on March 5 is still wrong, and nothing has been fixed. The first step in debugging this problem is to look at and/or post the log content (from both the master slave) from the time when this stopped working (march 5). On Fri, Mar 8, 2013 at 11:06 PM, akp geek akpg...@gmail.com wrote: What I noticed is... our streaming stopped on march 5th. I restarted the process today. However the new data is syncing correctly. But the data for these dates between 5th and today is not syncing. Is there some thing wrong that I did. here is what I did. 1.. created base backup 2. Took the data directory on to the slave 3. stopped the base backup on master 4. started the db on slave Appreciate your help. Regards On Sat, Mar 9, 2013 at 12:52 AM, akp geek akpg...@gmail.com wrote: Thanks a lot. I started the replication. It became very slow. It is taking long time to sync the masters data onto slave. Is there a way to find what's causing the issue? Regards On Fri, Mar 8, 2013 at 12:06 PM, John Laing john.la...@gmail.com wrote: I'm not sure about the existence of any standard scripts, but we have a pair of checks running periodically on the backup server. This shouldn't return anything: tail -3 /var/log/postgresql/postgresql-9.1-main.log | grep FATAL And this should return something: ps -u postgres -o cmd | grep postgres: wal receiver process streaming These have worked very reliably for many months. -John On Fri, Mar 8, 2013 at 11:53 AM, akp geek akpg...@gmail.com wrote: I got it fixed. What I did was $ psql -c SELECT pg_start_backup('label', true) $ rsync -a ${PGDATA}/ standby:/srv/pgsql/standby/ --exclude postmaster.pid $ psql -c SELECT pg_stop_backup() It took a while a to catch up the data. One question I have , are there any scripts to monitor the status of the replciation. so that I can be little proactive Regards On Thu, Mar 7, 2013 at 9:25 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Thu, Mar 7, 2013 at 5:28 PM, akp geek akpg...@gmail.com wrote: Hi all - Recently made change on our primary database default_text_search_config = 'pg_catalog.simple' . After that the replication is stopped. Can you please help me ? how to fix the issue. I am sure I made the change on the slave also. How can I start the replication and catch up the data. Thanks for your time. What are you seeing in your slony and / or postgresql logs, if anything? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] postgres 9.0.2 replicated database is crashing
That process merely sets up a new server, it doesn't start streaming, unless the server has been configured correctly. You state that the slave crashed after two hours. How did you make this determination? All you seem to be doing is setting up the slave from scratch repeatedly, and assuming that it will magically just work, rather than understanding debugging why its not working. Where is the log output from your servers that shows what is really transpiring? On Sat, Mar 9, 2013 at 6:51 AM, akp geek akpg...@gmail.com wrote: Hi all - I am in desperate need of your help. The replication/streaming stopped working on March5th. I followed the following procedure to restart the streaming. After running it for couple of hours , the database is crashing on the slave. This is on our production server. Thanks for your help. Steps that I followed 1. SELECT pg_start_backup('label', true); 2. I created tar of data. 3. un tar the data on the slave. 4. deleted the pid file on the slave 5. created recovery.conf 5. SELECT pg_stop_backup(); on the primary 6. started the postgres on slave 7. After 2 hours, the slave server crashes Please help. Regards -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] table spaces
Performance related question. With Linux (centos 6.3+), 64bit, ext4 in mind, how would you guys go about distributing write load across disks. Lets say I have quite few disks, and I can partition them the way I want, in mirror configuration (to get some hardware failure resilience). Should I separate tables from indexes onto separate raids ? I know WAL has to go on a separate disk, for added performance. I'm looking for your experiences, and most importantly how do you go about deciding which way is best. I.e. which combinations make sense to try out first, short of all permutations :-) Thanks. -- GJ
Re: [GENERAL] table spaces
Hi Gregg yes, keep the indexes on a separate channel. Much depends on how the data is mapped and accessed, sometimes even distributing the data itself onto different spaces may do good. If you use a lot of logging (say you feed a massive pgFouine activity), you would want to have that on yet another separate channel, too. There is no universal bullet for this, iostat will eventually tell you whether your load distribution is good enough, or not. Cheers Bèrto On 9 March 2013 17:51, Gregg Jaskiewicz gryz...@gmail.com wrote: Performance related question. With Linux (centos 6.3+), 64bit, ext4 in mind, how would you guys go about distributing write load across disks. Lets say I have quite few disks, and I can partition them the way I want, in mirror configuration (to get some hardware failure resilience). Should I separate tables from indexes onto separate raids ? I know WAL has to go on a separate disk, for added performance. I'm looking for your experiences, and most importantly how do you go about deciding which way is best. I.e. which combinations make sense to try out first, short of all permutations :-) Thanks. -- GJ -- == If Pac-Man had affected us as kids, we'd all be running around in a darkened room munching pills and listening to repetitive music. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Splitting Postgres into Separate Clusters?
Hello, I'm running a specialized search engine that indexes a few tens of millions of web pages, keeping everything in Postgres, and one problem I'm starting to see is poor cache hit rates. My database has two or three tables just for the text of the scraped pages, with one row every time a page is scraped and a `text` column for the HTML. These tables are almost-but-not-quite write-only. They are only read by one job, which uses them to create other objects in the system. I'd like the rest of the database to be in-memory all the time, but I don't really care if I have to read these tables from disk. To keep my problem tables from dragging down performance on the rest of the system, I'm thinking of splitting them out. I suppose I could save the HTML outside of Postgres entirely, but I'm wondering if a nicer solution would be to keep these tables in a separate cluster (i.e. /var/lib/postgresql/9.1/something_else -- I hope this is the right terminology). Then I could tune that cluster differently from the main cluster, or even put it on a different machine. And I could still use dblink to query both clusters conveniently (I think; this isn't really that important). Does this seem like a worthwhile approach? Is there a better way to deal with a few out-sized tables? Thanks, Paul -- _ Pulchritudo splendor veritatis.
[GENERAL] Avoiding a deadlock
I have a long-running multi-row UPDATE that is deadlocking with a single-row UPDATE: 2013-03-09 11:07:51 CST ERROR: deadlock detected 2013-03-09 11:07:51 CST DETAIL: Process 18851 waits for ShareLock on transaction 10307138; blocked by process 24203. Process 24203 waits for ShareLock on transaction 10306996; blocked by process 18851. Process 18851: UPDATE taggings tg SET score_tier = COALESCE(x.perc, 0) FROM(SELECT tg2.id, percent_rank() OVER (PARTITION BY tg2.tag_id ORDER BY tg2.score ASC) AS perc FROM taggings tg2, tags t WHERE tg2.score IS NOT NULL ANDtg2.tag_id = t.id ANDt.tier = 2) AS x WHERE tg.id = x.id AND tg.score IS NOT NULL ; Process 24203: UPDATE taggings SET score = 2 WHERE taggings.id = 29105523 Note that these two queries are actually updating different columns, albeit apparently in the same row. Is there anything I can do to avoid a deadlock here? The big query does nothing else in its transaction; the little query's transaction might update several rows from `taggings`, which I guess is the real reason for the deadlock. I'd be pretty satisfied with approximate values for the big query. As you can see, it is just taking the `score` of each `tagging` and computing the percentage of times it beats other taggings of the same tag. Is there something I can do with transaction isolation levels here? I don't care if the big query operates on slightly-out-of-date values. Since each query updates different columns, I think there should be no issue with them overwriting each other, right? Thanks, Paul -- _ Pulchritudo splendor veritatis.
Re: [GENERAL] postgres 9.0.2 replicated database is crashing
thank you. As you mentioned, I understood that I am starting the streaming scratch which is not what I wanted to do. Here is what I am planning to . Our replication process was down since March5th. 1. Is it Ok to get all wals from March5th till now to standby pg_xlog 2. take pg_basebackup 3. export the data dir 4. stop backup 5. restart the standby. Based on my understanding it should work, because when standby started, it is not able find the files, as I have deleted from the slave them when I tried to fix the issue 0 2013-03-09 04:56:08 GMT LOG: entering standby mode cp: cannot access /backup/9.0.4/archive/0001010600E1 0 2013-03-09 04:56:08 GMT LOG: redo starts at 106/E120 cp: cannot access /backup/9.0.4/archive/0001010600E2 cp: cannot access /backup/9.0.4/archive/0001010600E3 cp: cannot access /backup/9.0.4/archive/0001010600E4 0 2013-03-09 04:56:10 GMT LOG: consistent recovery state reached at 106/E45AD4A8 0 2013-03-09 04:56:10 GMT LOG: database system is ready to accept read only connections cp: cannot access /backup/9.0.4/archive/0001010600E5 cp: cannot access /backup/9.0.4/archive/0001010600E6 cp: cannot access /backup/9.0.4/archive/0001010600E7 cp: cannot access /backup/9.0.4/archive/0001010600E8 cp: cannot access /backup/9.0.4/archive/0001010600E9 cp: cannot access /backup/9.0.4/archive/0001010600EA 0 2013-03-09 04:56:12 GMT LOG: invalid record length at 106/EA10B8C0 cp: cannot access /backup/9.0.4/archive/0001010600EA 0 2013-03-09 04:56:12 GMT LOG: streaming replication successfully connected to primary [unknown] 0 2013-03-09 04:57:00 GMT [unknown]LOG: connection received: host=[local] [unknown] 0 2013-03-09 04:57:03 GMT [unknown]LOG: connection received: host=[local] [unknown] 10.155.253.43(51257) 0 SELECT 2013-03-09 07:07:18 GMT prodLog: duration: 6316.649 ms [unknown] 10.155.253.43(51257) 0 idle 2013-03-09 07:47:53 GMT prodLog: disconnection: session time: 0:41:06.529 user=postgres database=fprod host= 10.155.253.43 port=51257 0 2013-03-09 07:55:48 GMT LOG: restartpoint starting: time 0 2013-03-09 08:25:47 GMT LOG: restartpoint complete: wrote 19419 buffers (7.4%); write=1799.792 s, sync=0.066 s, total=1799.867 s 0 2013-03-09 08:25:47 GMT LOG: recovery restart point at 107/FB01B238 0 2013-03-09 08:25:47 GMT DETAIL: last completed transaction was at log time 2013-03-09 08:25:41.85776+00 0 2013-03-09 08:55:48 GMT LOG: restartpoint starting: time psql [local] 08P01 idle 2013-03-09 09:10:52 GMT prodLog: unexpected EOF on client connection psql [local] 0 idle 2013-03-09 09:10:52 GMT prodLog: disconnection: session time: 2:15:06.351 user=postgres database=fprod host=[local] XX000 2013-03-09 09:23:46 GMT FATAL: failed to add item to index page in 100924/100937/1225845 XX000 2013-03-09 09:23:46 GMT CONTEXT: xlog redo Insert item, node: 100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata: F isleaf F isdelete F updateBlkno:36483 0 2013-03-09 09:23:46 GMT LOG: startup process (PID 3880) exited with exit code 1 0 2013-03-09 09:23:46 GMT LOG: terminating any other active server processes thanks On Sat, Mar 9, 2013 at 11:16 AM, Lonni J Friedman netll...@gmail.comwrote: That process merely sets up a new server, it doesn't start streaming, unless the server has been configured correctly. You state that the slave crashed after two hours. How did you make this determination? All you seem to be doing is setting up the slave from scratch repeatedly, and assuming that it will magically just work, rather than understanding debugging why its not working. Where is the log output from your servers that shows what is really transpiring? On Sat, Mar 9, 2013 at 6:51 AM, akp geek akpg...@gmail.com wrote: Hi all - I am in desperate need of your help. The replication/streaming stopped working on March5th. I followed the following procedure to restart the streaming. After running it for couple of hours , the database is crashing on the slave. This is on our production server. Thanks for your help. Steps that I followed 1. SELECT pg_start_backup('label', true); 2. I created tar of data. 3. un tar the data on the slave. 4. deleted the pid file on the slave 5. created recovery.conf 5. SELECT pg_stop_backup(); on the primary 6. started the postgres on slave 7. After 2 hours, the slave server crashes Please help. Regards
Re: [GENERAL] postgres 9.0.2 replicated database is crashing
On Sat, Mar 9, 2013 at 1:51 PM, akp geek akpg...@gmail.com wrote: thank you. As you mentioned, I understood that I am starting the streaming scratch which is not what I wanted to do. Here is what I am planning to . Our replication process was down since March5th. 1. Is it Ok to get all wals from March5th till now to standby pg_xlog 2. take pg_basebackup 3. export the data dir 4. stop backup 5. restart the standby. Based on my understanding it should work, because when standby started, it is not able find the files, as I have deleted from the slave them when I tried to fix the issue Clearly it didn't work because you've been having issues for the past 4 days, and you've already tried this approach unsuccessfully. Based on the log snippets below, it looks like you have multiple problems. First your slave is definitely missing WAL segments. You should increase the number of WAL segments that are archived on the master and ensure that your base backup is including the WAL segments (by generating it with the -x option). However, that's the least of your problems at the moment, because it looks like the master is what is crashing. 0 2013-03-09 04:56:08 GMT LOG: entering standby mode cp: cannot access /backup/9.0.4/archive/0001010600E1 0 2013-03-09 04:56:08 GMT LOG: redo starts at 106/E120 cp: cannot access /backup/9.0.4/archive/0001010600E2 cp: cannot access /backup/9.0.4/archive/0001010600E3 cp: cannot access /backup/9.0.4/archive/0001010600E4 0 2013-03-09 04:56:10 GMT LOG: consistent recovery state reached at 106/E45AD4A8 0 2013-03-09 04:56:10 GMT LOG: database system is ready to accept read only connections cp: cannot access /backup/9.0.4/archive/0001010600E5 cp: cannot access /backup/9.0.4/archive/0001010600E6 cp: cannot access /backup/9.0.4/archive/0001010600E7 cp: cannot access /backup/9.0.4/archive/0001010600E8 cp: cannot access /backup/9.0.4/archive/0001010600E9 cp: cannot access /backup/9.0.4/archive/0001010600EA 0 2013-03-09 04:56:12 GMT LOG: invalid record length at 106/EA10B8C0 cp: cannot access /backup/9.0.4/archive/0001010600EA 0 2013-03-09 04:56:12 GMT LOG: streaming replication successfully connected to primary [unknown] 0 2013-03-09 04:57:00 GMT [unknown]LOG: connection received: host=[local] [unknown] 0 2013-03-09 04:57:03 GMT [unknown]LOG: connection received: host=[local] [unknown] 10.155.253.43(51257) 0 SELECT 2013-03-09 07:07:18 GMT prodLog: duration: 6316.649 ms [unknown] 10.155.253.43(51257) 0 idle 2013-03-09 07:47:53 GMT prodLog: disconnection: session time: 0:41:06.529 user=postgres database=fprod host= 10.155.253.43 port=51257 0 2013-03-09 07:55:48 GMT LOG: restartpoint starting: time 0 2013-03-09 08:25:47 GMT LOG: restartpoint complete: wrote 19419 buffers (7.4%); write=1799.792 s, sync=0.066 s, total=1799.867 s 0 2013-03-09 08:25:47 GMT LOG: recovery restart point at 107/FB01B238 0 2013-03-09 08:25:47 GMT DETAIL: last completed transaction was at log time 2013-03-09 08:25:41.85776+00 0 2013-03-09 08:55:48 GMT LOG: restartpoint starting: time psql [local] 08P01 idle 2013-03-09 09:10:52 GMT prodLog: unexpected EOF on client connection psql [local] 0 idle 2013-03-09 09:10:52 GMT prodLog: disconnection: session time: 2:15:06.351 user=postgres database=fprod host=[local] XX000 2013-03-09 09:23:46 GMT FATAL: failed to add item to index page in 100924/100937/1225845 XX000 2013-03-09 09:23:46 GMT CONTEXT: xlog redo Insert item, node: 100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata: F isleaf F isdelete F updateBlkno:36483 0 2013-03-09 09:23:46 GMT LOG: startup process (PID 3880) exited with exit code 1 0 2013-03-09 09:23:46 GMT LOG: terminating any other active server processes The real problem is here: XX000 2013-03-09 09:23:46 GMT FATAL: failed to add item to index page in 100924/100937/1225845 XX000 2013-03-09 09:23:46 GMT CONTEXT: xlog redo Insert item, node: 100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata: F isleaf F isdelete F updateBlkno:36483 That looks like something is badly broken (maybe data corruption)? Most of the google hits on that error are associated with GIST indexes. Are you using GIST indexes? Are you really (still) using 9.0.4 ? I hope you realize that there are known data corruption bugs in that version, and that version is super old at this point. You really need to update to 9.0.12. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] postgres 9.0.2 replicated database is crashing
Appreciate your findings. taking your points and doing things now. We can upgrade. Which version is more stable 9.2 or 9.1 We use GIST indexes quite a bit. and we gis also I recently compiled postgres 9.2 .. Regards On Sat, Mar 9, 2013 at 5:09 PM, Lonni J Friedman netll...@gmail.com wrote: On Sat, Mar 9, 2013 at 1:51 PM, akp geek akpg...@gmail.com wrote: thank you. As you mentioned, I understood that I am starting the streaming scratch which is not what I wanted to do. Here is what I am planning to . Our replication process was down since March5th. 1. Is it Ok to get all wals from March5th till now to standby pg_xlog 2. take pg_basebackup 3. export the data dir 4. stop backup 5. restart the standby. Based on my understanding it should work, because when standby started, it is not able find the files, as I have deleted from the slave them when I tried to fix the issue Clearly it didn't work because you've been having issues for the past 4 days, and you've already tried this approach unsuccessfully. Based on the log snippets below, it looks like you have multiple problems. First your slave is definitely missing WAL segments. You should increase the number of WAL segments that are archived on the master and ensure that your base backup is including the WAL segments (by generating it with the -x option). However, that's the least of your problems at the moment, because it looks like the master is what is crashing. 0 2013-03-09 04:56:08 GMT LOG: entering standby mode cp: cannot access /backup/9.0.4/archive/0001010600E1 0 2013-03-09 04:56:08 GMT LOG: redo starts at 106/E120 cp: cannot access /backup/9.0.4/archive/0001010600E2 cp: cannot access /backup/9.0.4/archive/0001010600E3 cp: cannot access /backup/9.0.4/archive/0001010600E4 0 2013-03-09 04:56:10 GMT LOG: consistent recovery state reached at 106/E45AD4A8 0 2013-03-09 04:56:10 GMT LOG: database system is ready to accept read only connections cp: cannot access /backup/9.0.4/archive/0001010600E5 cp: cannot access /backup/9.0.4/archive/0001010600E6 cp: cannot access /backup/9.0.4/archive/0001010600E7 cp: cannot access /backup/9.0.4/archive/0001010600E8 cp: cannot access /backup/9.0.4/archive/0001010600E9 cp: cannot access /backup/9.0.4/archive/0001010600EA 0 2013-03-09 04:56:12 GMT LOG: invalid record length at 106/EA10B8C0 cp: cannot access /backup/9.0.4/archive/0001010600EA 0 2013-03-09 04:56:12 GMT LOG: streaming replication successfully connected to primary [unknown] 0 2013-03-09 04:57:00 GMT [unknown]LOG: connection received: host=[local] [unknown] 0 2013-03-09 04:57:03 GMT [unknown]LOG: connection received: host=[local] [unknown] 10.155.253.43(51257) 0 SELECT 2013-03-09 07:07:18 GMT prodLog: duration: 6316.649 ms [unknown] 10.155.253.43(51257) 0 idle 2013-03-09 07:47:53 GMT prodLog: disconnection: session time: 0:41:06.529 user=postgres database=fprod host= 10.155.253.43 port=51257 0 2013-03-09 07:55:48 GMT LOG: restartpoint starting: time 0 2013-03-09 08:25:47 GMT LOG: restartpoint complete: wrote 19419 buffers (7.4%); write=1799.792 s, sync=0.066 s, total=1799.867 s 0 2013-03-09 08:25:47 GMT LOG: recovery restart point at 107/FB01B238 0 2013-03-09 08:25:47 GMT DETAIL: last completed transaction was at log time 2013-03-09 08:25:41.85776+00 0 2013-03-09 08:55:48 GMT LOG: restartpoint starting: time psql [local] 08P01 idle 2013-03-09 09:10:52 GMT prodLog: unexpected EOF on client connection psql [local] 0 idle 2013-03-09 09:10:52 GMT prodLog: disconnection: session time: 2:15:06.351 user=postgres database=fprod host=[local] XX000 2013-03-09 09:23:46 GMT FATAL: failed to add item to index page in 100924/100937/1225845 XX000 2013-03-09 09:23:46 GMT CONTEXT: xlog redo Insert item, node: 100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata: F isleaf F isdelete F updateBlkno:36483 0 2013-03-09 09:23:46 GMT LOG: startup process (PID 3880) exited with exit code 1 0 2013-03-09 09:23:46 GMT LOG: terminating any other active server processes The real problem is here: XX000 2013-03-09 09:23:46 GMT FATAL: failed to add item to index page in 100924/100937/1225845 XX000 2013-03-09 09:23:46 GMT CONTEXT: xlog redo Insert item, node: 100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata: F isleaf F isdelete F updateBlkno:36483 That looks like something is badly broken (maybe data corruption)? Most of the google hits on that error are associated with GIST indexes. Are you using GIST indexes? Are you really (still) using 9.0.4 ? I hope you realize that there are known data corruption bugs
Re: [GENERAL] postgres 9.0.2 replicated database is crashing
On Sat, Mar 9, 2013 at 4:05 PM, akp geek akpg...@gmail.com wrote: Appreciate your findings. taking your points and doing things now. We can upgrade. Which version is more stable 9.2 or 9.1 They're two entirely different branches. They should both be equally stable. However, if you're looking for the most straightforward path I'd recommend going to 9.0.12. Also be sure to read the release notes first. We use GIST indexes quite a bit. and we gis also I recently compiled postgres 9.2 .. Regards On Sat, Mar 9, 2013 at 5:09 PM, Lonni J Friedman netll...@gmail.com wrote: On Sat, Mar 9, 2013 at 1:51 PM, akp geek akpg...@gmail.com wrote: thank you. As you mentioned, I understood that I am starting the streaming scratch which is not what I wanted to do. Here is what I am planning to . Our replication process was down since March5th. 1. Is it Ok to get all wals from March5th till now to standby pg_xlog 2. take pg_basebackup 3. export the data dir 4. stop backup 5. restart the standby. Based on my understanding it should work, because when standby started, it is not able find the files, as I have deleted from the slave them when I tried to fix the issue Clearly it didn't work because you've been having issues for the past 4 days, and you've already tried this approach unsuccessfully. Based on the log snippets below, it looks like you have multiple problems. First your slave is definitely missing WAL segments. You should increase the number of WAL segments that are archived on the master and ensure that your base backup is including the WAL segments (by generating it with the -x option). However, that's the least of your problems at the moment, because it looks like the master is what is crashing. 0 2013-03-09 04:56:08 GMT LOG: entering standby mode cp: cannot access /backup/9.0.4/archive/0001010600E1 0 2013-03-09 04:56:08 GMT LOG: redo starts at 106/E120 cp: cannot access /backup/9.0.4/archive/0001010600E2 cp: cannot access /backup/9.0.4/archive/0001010600E3 cp: cannot access /backup/9.0.4/archive/0001010600E4 0 2013-03-09 04:56:10 GMT LOG: consistent recovery state reached at 106/E45AD4A8 0 2013-03-09 04:56:10 GMT LOG: database system is ready to accept read only connections cp: cannot access /backup/9.0.4/archive/0001010600E5 cp: cannot access /backup/9.0.4/archive/0001010600E6 cp: cannot access /backup/9.0.4/archive/0001010600E7 cp: cannot access /backup/9.0.4/archive/0001010600E8 cp: cannot access /backup/9.0.4/archive/0001010600E9 cp: cannot access /backup/9.0.4/archive/0001010600EA 0 2013-03-09 04:56:12 GMT LOG: invalid record length at 106/EA10B8C0 cp: cannot access /backup/9.0.4/archive/0001010600EA 0 2013-03-09 04:56:12 GMT LOG: streaming replication successfully connected to primary [unknown] 0 2013-03-09 04:57:00 GMT [unknown]LOG: connection received: host=[local] [unknown] 0 2013-03-09 04:57:03 GMT [unknown]LOG: connection received: host=[local] [unknown] 10.155.253.43(51257) 0 SELECT 2013-03-09 07:07:18 GMT prodLog: duration: 6316.649 ms [unknown] 10.155.253.43(51257) 0 idle 2013-03-09 07:47:53 GMT prodLog: disconnection: session time: 0:41:06.529 user=postgres database=fprod host= 10.155.253.43 port=51257 0 2013-03-09 07:55:48 GMT LOG: restartpoint starting: time 0 2013-03-09 08:25:47 GMT LOG: restartpoint complete: wrote 19419 buffers (7.4%); write=1799.792 s, sync=0.066 s, total=1799.867 s 0 2013-03-09 08:25:47 GMT LOG: recovery restart point at 107/FB01B238 0 2013-03-09 08:25:47 GMT DETAIL: last completed transaction was at log time 2013-03-09 08:25:41.85776+00 0 2013-03-09 08:55:48 GMT LOG: restartpoint starting: time psql [local] 08P01 idle 2013-03-09 09:10:52 GMT prodLog: unexpected EOF on client connection psql [local] 0 idle 2013-03-09 09:10:52 GMT prodLog: disconnection: session time: 2:15:06.351 user=postgres database=fprod host=[local] XX000 2013-03-09 09:23:46 GMT FATAL: failed to add item to index page in 100924/100937/1225845 XX000 2013-03-09 09:23:46 GMT CONTEXT: xlog redo Insert item, node: 100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata: F isleaf F isdelete F updateBlkno:36483 0 2013-03-09 09:23:46 GMT LOG: startup process (PID 3880) exited with exit code 1 0 2013-03-09 09:23:46 GMT LOG: terminating any other active server processes The real problem is here: XX000 2013-03-09 09:23:46 GMT FATAL: failed to add item to index page in 100924/100937/1225845 XX000 2013-03-09 09:23:46 GMT CONTEXT: xlog redo Insert item, node: 100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata: F
Re: [GENERAL] selecting for type cast failures
Adrian Klaver-3 wrote My opinion, it would take more time to concoct regexes that cover all the corner cases than to write a script that walks the through the data , finds the problem data and flags them. ISTM that using regular expressions is necessary regardless of whether you put them into a function/script or otherwise use them interactively via queries... David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/selecting-for-type-cast-failures-tp5747875p5747890.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] crosstab creating multiple rows for same id
you provide inadequate information to provide anything more than a guess... Are you sure your issue isn't simply a client display consideration - basically word-wrapping? David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/crosstab-creating-multiple-rows-for-same-id-tp5747865p5747883.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL connect with Visual C++
I worked upon this link you sent. It ends up giving error This applcation can not start because SSLEAY32.dll is missing. I want to connect in any form whether it is ODBC or direct. If you know then please help me. I am stuck here. Thank you -- View this message in context: http://postgresql.1045698.n5.nabble.com/PostgreSQL-connect-with-Visual-C-tp5747463p5747642.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] postgres 9.0.2 replicated database is crashing
thanks a lot. One final question on this. if the streaming fails as in our case, we missed 4 days worth of data. We archive the log files on the primary. if we were to restart , what should be starting the point ? Do we need to get all the archived logs to the pg_xlogs from the date ( in our case 03/05 ) . This is where I got confused. Again thanks a lot. Now we will definitely upgrade Regards On Sat, Mar 9, 2013 at 7:11 PM, Lonni J Friedman netll...@gmail.com wrote: On Sat, Mar 9, 2013 at 4:05 PM, akp geek akpg...@gmail.com wrote: Appreciate your findings. taking your points and doing things now. We can upgrade. Which version is more stable 9.2 or 9.1 They're two entirely different branches. They should both be equally stable. However, if you're looking for the most straightforward path I'd recommend going to 9.0.12. Also be sure to read the release notes first. We use GIST indexes quite a bit. and we gis also I recently compiled postgres 9.2 .. Regards On Sat, Mar 9, 2013 at 5:09 PM, Lonni J Friedman netll...@gmail.com wrote: On Sat, Mar 9, 2013 at 1:51 PM, akp geek akpg...@gmail.com wrote: thank you. As you mentioned, I understood that I am starting the streaming scratch which is not what I wanted to do. Here is what I am planning to . Our replication process was down since March5th. 1. Is it Ok to get all wals from March5th till now to standby pg_xlog 2. take pg_basebackup 3. export the data dir 4. stop backup 5. restart the standby. Based on my understanding it should work, because when standby started, it is not able find the files, as I have deleted from the slave them when I tried to fix the issue Clearly it didn't work because you've been having issues for the past 4 days, and you've already tried this approach unsuccessfully. Based on the log snippets below, it looks like you have multiple problems. First your slave is definitely missing WAL segments. You should increase the number of WAL segments that are archived on the master and ensure that your base backup is including the WAL segments (by generating it with the -x option). However, that's the least of your problems at the moment, because it looks like the master is what is crashing. 0 2013-03-09 04:56:08 GMT LOG: entering standby mode cp: cannot access /backup/9.0.4/archive/0001010600E1 0 2013-03-09 04:56:08 GMT LOG: redo starts at 106/E120 cp: cannot access /backup/9.0.4/archive/0001010600E2 cp: cannot access /backup/9.0.4/archive/0001010600E3 cp: cannot access /backup/9.0.4/archive/0001010600E4 0 2013-03-09 04:56:10 GMT LOG: consistent recovery state reached at 106/E45AD4A8 0 2013-03-09 04:56:10 GMT LOG: database system is ready to accept read only connections cp: cannot access /backup/9.0.4/archive/0001010600E5 cp: cannot access /backup/9.0.4/archive/0001010600E6 cp: cannot access /backup/9.0.4/archive/0001010600E7 cp: cannot access /backup/9.0.4/archive/0001010600E8 cp: cannot access /backup/9.0.4/archive/0001010600E9 cp: cannot access /backup/9.0.4/archive/0001010600EA 0 2013-03-09 04:56:12 GMT LOG: invalid record length at 106/EA10B8C0 cp: cannot access /backup/9.0.4/archive/0001010600EA 0 2013-03-09 04:56:12 GMT LOG: streaming replication successfully connected to primary [unknown] 0 2013-03-09 04:57:00 GMT [unknown]LOG: connection received: host=[local] [unknown] 0 2013-03-09 04:57:03 GMT [unknown]LOG: connection received: host=[local] [unknown] 10.155.253.43(51257) 0 SELECT 2013-03-09 07:07:18 GMT prodLog: duration: 6316.649 ms [unknown] 10.155.253.43(51257) 0 idle 2013-03-09 07:47:53 GMT prodLog: disconnection: session time: 0:41:06.529 user=postgres database=fprod host= 10.155.253.43 port=51257 0 2013-03-09 07:55:48 GMT LOG: restartpoint starting: time 0 2013-03-09 08:25:47 GMT LOG: restartpoint complete: wrote 19419 buffers (7.4%); write=1799.792 s, sync=0.066 s, total=1799.867 s 0 2013-03-09 08:25:47 GMT LOG: recovery restart point at 107/FB01B238 0 2013-03-09 08:25:47 GMT DETAIL: last completed transaction was at log time 2013-03-09 08:25:41.85776+00 0 2013-03-09 08:55:48 GMT LOG: restartpoint starting: time psql [local] 08P01 idle 2013-03-09 09:10:52 GMT prodLog: unexpected EOF on client connection psql [local] 0 idle 2013-03-09 09:10:52 GMT prodLog: disconnection: session time: 2:15:06.351 user=postgres database=fprod host=[local] XX000 2013-03-09 09:23:46 GMT FATAL: failed to add item to index page in 100924/100937/1225845 XX000 2013-03-09 09:23:46 GMT CONTEXT:
Re: [GENERAL] table spaces
On Sat, Mar 9, 2013 at 10:51 AM, Gregg Jaskiewicz gryz...@gmail.com wrote: Performance related question. With Linux (centos 6.3+), 64bit, ext4 in mind, how would you guys go about distributing write load across disks. Lets say I have quite few disks, and I can partition them the way I want, in mirror configuration (to get some hardware failure resilience). Should I separate tables from indexes onto separate raids ? I know WAL has to go on a separate disk, for added performance. I'm looking for your experiences, and most importantly how do you go about deciding which way is best. I.e. which combinations make sense to try out first, short of all permutations :-) First get a baseline for how things work with just pg_xlog on one small set (RAID 1 is often plenty) and RAID-10 on all the rest with all the data (i.e. base directory) there. With a fast HW RAID controller this is often just about as fast as any amount of breaking things out will be. But if you do break things out and they are fster then you'll know by how much. If it's slower then you know you've got a really busy set and some not so busy ones. And so on... -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general