from:"Lonni J Friedman"

Re: [GENERAL] window function ordering not working as expected

2015-02-17 Thread Lonni J Friedman

On Tue, Feb 17, 2015 at 4:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Lonni J Friedman netll...@gmail.com writes:
 I'm interested in seeing:
 * the date for the most recent result
 * test name (identifier)
 * most recent result (decimal value)
 * the worst (lowest decimal value) test result from the past 21 days
 * the date which corresponds with the worst test result from the past 21 days
 * the 2nd worst (2nd lowest decimal value) test result
 ...
 The problem that I'm seeing is in the prv_score column. It should show
 a value of 0.6, which corresponds with 2015-02-13, however instead its
 returning 0.7. I thought by ordering by metrics-'PT TWBR' I'd always
 be sorting by the scores, and as a result, the lead(metrics-'PT
 TWBR', 1) would give me the next greatest value of the score. Thus my
 confusion as to why ORDER BY metrics-'PT TWBR' isn't working as
 expected.

 lead() and lag() retrieve values from rows that are N away from the
 current row in the specified ordering.  That isn't what you want here
 AFAICS.

 I think the worst test result would be obtained with
 nth_value(metrics-'PT TWBR', 1)
 which is equivalent to what you used,
 first_value(metrics-'PT TWBR')
 while the 2nd worst result would be obtained with
 nth_value(metrics-'PT TWBR', 2)

 However, worst and 2nd worst with this implementation would mean
 worst and 2nd worst within the partition, which isn't the stated
 goal either, at least not with the partition definition you're using.

 What you really want for the worst in last 21 days is something like

 min(metrics-'PT TWBR') OVER (
PARTITION BY ... that same mess you used ...
ORDER BY tstamp
RANGE BETWEEN '21 days'::interval PRECEDING AND CURRENT ROW)

 However Postgres doesn't implement RANGE x PRECEDING yet.  You could
 get worst in last 21 observations easily:

 min(metrics-'PT TWBR') OVER (
PARTITION BY ... that mess ...
ORDER BY tstamp
ROWS BETWEEN 20 PRECEDING AND CURRENT ROW)

 and maybe that's close enough.

 I do not know an easy way to get second worst :-(.  You could build a
 user-defined aggregate to produce second smallest value among the inputs
 and then apply it in the same way as I used min() here.

Thanks Tom, much appreciate the fast reply.  I'll chew this over and
see if I have any other questions.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] window function ordering not working as expected

2015-02-17 Thread Lonni J Friedman

Greetings,
I have a postgresql-9.3.x database with a table with a variety of date
stamped test results, some of which are stored in json format
(natively in the database). I'm attempting to use some window
functions to pull out specific data from the test results over a a
time window, but part of the results are not making sense. Some tests
run every day, others less frequently. For each unique test's results,
I'm interested in seeing:

* the date for the most recent result
* test name (identifier)
* most recent result (decimal value)
* the worst (lowest decimal value) test result from the past 21 days
* the date which corresponds with the worst test result from the past 21 days
* the 2nd worst (2nd lowest decimal value) test result

Here's a sample of the data and resulting score for one test (tname)
from the past few weeks:

  tstamp   |  tname  | score
+-+
2015-02-17 | dfw001.ix-cr-02 | 0.7
2015-02-15 | dfw001.ix-cr-02 | 0.6
2015-02-14 | dfw001.ix-cr-02 | 0.6
2015-02-14 | dfw001.ix-cr-02 | 0.7
2015-02-13 | dfw001.ix-cr-02 | 0.6
2015-02-12 | dfw001.ix-cr-02 | 0.7
2015-02-11 | dfw001.ix-cr-02 | 0.7
2015-02-10 | dfw001.ix-cr-02 | 0.7
2015-02-09 | dfw001.ix-cr-02 | 0.7
2015-02-08 | dfw001.ix-cr-02 | 0.7
2015-02-08 | dfw001.ix-cr-02 | 0.5
2015-02-07 | dfw001.ix-cr-02 | 0.7
2015-02-07 | dfw001.ix-cr-02 | 0.5
2015-02-06 | dfw001.ix-cr-02 | 0.7
2015-02-05 | dfw001.ix-cr-02 | 0.7
2015-02-04 | dfw001.ix-cr-02 | 0.7
2015-01-30 | dfw001.ix-cr-02 | 0.7

Here's the SQL query that I'm running:

SELECT * FROM
(SELECT tstamp,
concat_ws('/',attrs-'RCluster ID',
regexp_replace(replace(replace(attrs-'ASN HTML','/a',''),'a href=
''http://ncapp100.prod.com/Cluster3.php?asn=',''),'\d+(d=5d''
target=''_blank'')','')) AS tname ,
metrics-'PT TWBR' AS score,
first_value(metrics-'PT TWBR') OVER
(PARTITION BY concat_ws('/',attrs-'Route Cluster
ID', regexp_replace(replace(replace(attrs-'ASN HTML','/a',''),'a
href= ''http://ncapp100.prod.com/Cluster3.php?asn=',''),'\d+(d=5d''
target=''_blank'')',''))
ORDER BY metrics-'PT TWBR') AS worst_score,
first_value(tstamp) OVER
(PARTITION BY concat_ws('/',attrs-'Route Cluster
ID', regexp_replace(replace(replace(attrs-'ASN HTML','/a',''),'a
href= ''http://ncapp100.prod.com/Cluster3.php?asn=',''),'\d+(d=5d''
target=''_blank'')',''))
ORDER BY metrics-'PT TWBR') AS worst_date,
lead(metrics-'PT TWBR', 1) OVER
(PARTITION BY concat_ws('/',attrs-'Route Cluster
ID', regexp_replace(replace(replace(attrs-'ASN HTML','/a',''),'a
href= ''http://ncapp100.prod.com/Cluster3.php?asn=',''),'\d+(d=5d''
target=''_blank'')',''))
ORDER BY metrics-'PT TWBR') AS prv_score
FROM btworks
WHERE
age(now(),tstamp)  '21 days'
ORDER BY tstamp DESC, rank
) AS stuff
WHERE
tstamp = '2015-02-17';

Here's the data from the above query as it pertains to the data
(tname='dfw001.ix-cr-02') set that I posted above:

   tstamp   | tname | score | worst_score | worst_date
| prv_score
+---+---+-++---
 2015-02-17 | dfw001.ix-cr-02   | 0.7   | 0.5 | 2015-02-08 | 0.7

The problem that I'm seeing is in the prv_score column. It should show
a value of 0.6, which corresponds with 2015-02-13, however instead its
returning 0.7. I thought by ordering by metrics-'PT TWBR' I'd always
be sorting by the scores, and as a result, the lead(metrics-'PT
TWBR', 1) would give me the next greatest value of the score. Thus my
confusion as to why ORDER BY metrics-'PT TWBR' isn't working as
expected.


thanks in advance for any pointers.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] partitioned table + postgres_FDW not working in 9.3

2013-09-26 Thread Lonni J Friedman

Hi Shigeru,
Thanks for your reply.  This sounds like a relatively simple
workaround, so I'll give it a try.  Is the search_path of the remote
session that postgres_fdw forces considered to be intentional,
expected behavior, or is it a bug?

thanks!

On Wed, Sep 25, 2013 at 7:13 PM, Shigeru Hanada
shigeru.han...@gmail.com wrote:
 Hi Lonni,

 2013/9/25 Lonni J Friedman netll...@gmail.com:
 The problem that I'm experiencing is if I attempt to perform an INSERT
 on the foreign nppsmoke table on cluster a, it fails claiming that the
 table partition which should hold the data in the INSERT does not
 exist:

 ERROR:  relation nppsmoke_2013_09 does not exist
 CONTEXT:  Remote SQL command: INSERT INTO public.nppsmoke(id,
 date_created, last_update, build_type, current_status, info, cudacode,
 gpu, subtest, os, osversion, arch, cl, dispvers, branch, pass, fail,
 oldfail, newfail, failureslog, totdriver, ddcl, buildid, testdcmd,
 pclog, filtercount, filterlog, error) VALUES ($1, $2, $3, $4, $5, $6,
 $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20,
 $21, $22, $23, $24, $25, $26, $27, $28)
 PL/pgSQL function public.nppsmoke_insert_trigger() line 30 at SQL statement

 I could reproduce the problem.

 If I run the same exact SQL INSERT on cluster b (not using the foreign
 table), then it works.  So whatever is going wrong seems to be related
 to the foreign table.  Initially I thought that perhaps the problem
 was that I needed to create all of the partitions as foreign tables on
 cluster a, but that doesn't help.

 Am I hitting some kind of foreign data wrapper limitation, or am I
 doing something wrong?

 The cause of the problem is search_path setting of remote session.
 For some reasons, postgres_fdw forces the search_path on the remote
 side to be 'pg_catalog', so all objects used in the session
 established by postgres_fdw have to be schema-qualified.  Trigger
 function is executed in such context, so you need to qualify all
 objects in your trigger function with schema name, like
 'public.nppsmoke_2013_09'.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] partitioned table + postgres_FDW not working in 9.3

2013-09-26 Thread Lonni J Friedman

On Thu, Sep 26, 2013 at 8:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Lonni J Friedman netll...@gmail.com writes:
 Thanks for your reply.  This sounds like a relatively simple
 workaround, so I'll give it a try.  Is the search_path of the remote
 session that postgres_fdw forces considered to be intentional,
 expected behavior, or is it a bug?

 It's intentional.

 Possibly more to the point, don't you think your trigger function is
 rather fragile if it assumes the caller has provided a particular
 search path setting?

To be honest, I don't have much experience with functions, and was
using the trigger function from the official documentation:
http://www.postgresql.org/docs/9.3/static/ddl-partitioning.html


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] pg_basebackup: ERROR: could not find any WAL files (9.3)

2013-09-26 Thread Lonni J Friedman

Greetings,
I've recently pushed a new postgres-9.3 (Linux-x86_64/RHEL6) cluster
into production, with one master, and two hot standby streaming
replication slaves.  Everything seems to be working ok, however
roughly half of my pg_basebackup attempts are failing at the very end
with the error:

pg_basebackup: could not get transaction log end position from server:
ERROR:  could not find any WAL files

I should note that I'm running pg_basebackup on one of the two slaves,
and not the master.  However, I've got an older, separate 9.3 cluster
with the same setup, and pg_basebackup never fails there.

I thought that the WAL files in question were coming from the pg_xlog
subdirectory.  But I don't see any lack of files there on the server
running pg_basebackup.  They are being generated continuously (as
expected), before, during  after the pg_basebackup.  I scanned the
source ( http://doxygen.postgresql.org/basebackup_8c_source.html ),
and it seems to backup my understanding of the expected behavior:

306  /*
307  * There must be at least one xlog file in the pg_xlog directory,
308  * since we are doing backup-including-xlog.
309  */
310  if (nWalFiles  1)
311  ereport(ERROR,
312  (errmsg(could not find any WAL files)));

However, what I see on the server conflicts with the error.
pg_basebackup was invoked on Thu Sep 26 01:00:01 PDT 2013, and failed
on Thu Sep 26 02:09:12 PDT 2013.  In the pg_xlog subdirectory, I see
lots of WAL files present, before, during  after pg_basebackup was
run:
-rw--- 1 postgres postgres 16777216 Sep 26 00:38 0001208A00E3
-rw--- 1 postgres postgres 16777216 Sep 26 00:43 0001208A00E4
-rw--- 1 postgres postgres 16777216 Sep 26 00:48 0001208A00E5
-rw--- 1 postgres postgres 16777216 Sep 26 00:53 0001208A00E6
-rw--- 1 postgres postgres 16777216 Sep 26 00:58 0001208A00E7
-rw--- 1 postgres postgres 16777216 Sep 26 01:03 0001208A00E8
-rw--- 1 postgres postgres 16777216 Sep 26 01:08 0001208A00E9
-rw--- 1 postgres postgres 16777216 Sep 26 01:14 0001208A00EA
-rw--- 1 postgres postgres 16777216 Sep 26 01:19 0001208A00EB
-rw--- 1 postgres postgres 16777216 Sep 26 01:24 0001208A00EC
-rw--- 1 postgres postgres 16777216 Sep 26 01:29 0001208A00ED
-rw--- 1 postgres postgres 16777216 Sep 26 01:34 0001208A00EE
-rw--- 1 postgres postgres 16777216 Sep 26 01:38 0001208A00EF
-rw--- 1 postgres postgres 16777216 Sep 26 01:43 0001208A00F0
-rw--- 1 postgres postgres 16777216 Sep 26 01:48 0001208A00F1
-rw--- 1 postgres postgres 16777216 Sep 26 01:53 0001208A00F2
-rw--- 1 postgres postgres 16777216 Sep 26 01:58 0001208A00F3
-rw--- 1 postgres postgres 16777216 Sep 26 02:03 0001208A00F4
-rw--- 1 postgres postgres 16777216 Sep 26 02:08 0001208A00F5
-rw--- 1 postgres postgres 16777216 Sep 26 02:14 0001208A00F6


Thanks in advance for any pointers.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] postgres FDW doesn't support sequences?

2013-09-25 Thread Lonni J Friedman

I've got two 9.3 clusters, with a postgres foreign data wrapper (FDW)
setup to point from one cluster to the other.  One of the (foreign)
tables associated with the foreign server has a bigint sequence for
its primary key, defined as:

id | bigint  | not null default
nextval('nppsmoke_id_seq1'::regclass)


If I INSERT a new row into the local table (not the foreign table
version), without specifying the 'id' column explicitly, it
automatically is assigned the nextval in the sequence counter.
However, if I attempt to run the same INSERT using the foreign table,
it always fails complaining that null value in column id violates
not-null constraint.  It seems like the FDW is somehow ignoring the
existence of the sequence default value, and rewriting the SQL query
to explicitly attempt to insert a NULL value.  Here's the full query 
resulting error output:

nightly=# INSERT into nppsmoke
(date_created,last_update,build_type,current_status,info,cudacode,gpu,subtest,os,arch,cl,dispvers,branch,totdriver,ddcl,testdcmd,osversion)
VALUES 
((date_trunc('second',now())),(date_trunc('second',now())),'release','Building','npp-release-gpu-buildCUDA-2013-09-24-1380041350.log','2013-09-24.cuda-linux64-test42.release.gpu','380','CUDA
build','Linux','x86_64','16935289','CBS_cuda_a_2013-09-24_16935289','cuda_a','1','16935289','./npp-smoke.sh
--testtype release --amodel f --vc g --drvpath
/home/lfriedman/cuda-stuff/sw/dev/gpu_drv/cuda_a/drivers/gpgpu
--cudaroot /home/lfriedman/cuda-stuff/sw/gpgpu --totdriver t  --email
lfriedman','2.6.32-358.el6.x86_64');
ERROR:  null value in column id violates not-null constraint
DETAIL:  Failing row contains (null, 2013-09-25 08:00:46, 2013-09-25
08:00:46, release, Building,
npp-release-gpu-buildCUDA-2013-09-24-1380041350.log,
2013-09-24.cuda-linux64-test42.release.gpu, 380, CUDA build, Linux,
2.6.32-358.el6.x86_64, x86_64, 16935289,
CBS_cuda_a_2013-09-24_16935289, cuda_a, null, null, null, null, null,
t, 16935289, null, ./npp-smoke.sh --testtype release --amodel f --vc g
--drvpath /h..., null, null, null, null, g).
CONTEXT:  Remote SQL command: INSERT INTO public.nppsmoke(id,
date_created, last_update, build_type, current_status, info, cudacode,
gpu, subtest, os, osversion, arch, cl, dispvers, branch, pass, fail,
oldfail, newfail, failureslog, totdriver, ddcl, buildid, testdcmd,
pclog, filtercount, filterlog, error) VALUES ($1, $2, $3, $4, $5, $6,
$7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20,
$21, $22, $23, $24, $25, $26, $27, $28)

I tried to recreate the foreign table definition with a primary key,
and that failed:
ERROR:  constraints are not supported on foreign tables

Are sequences supported with the postgres FDW?  If not, is there any
workaround for inserting into a foreign table that doesn't require me
to explicitly specify a value for the primary key sequence column in
my INSERT statements?

thanks!


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgres FDW doesn't support sequences?

2013-09-25 Thread Lonni J Friedman

On Wed, Sep 25, 2013 at 2:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Lonni J Friedman netll...@gmail.com writes:
 If I INSERT a new row into the local table (not the foreign table
 version), without specifying the 'id' column explicitly, it
 automatically is assigned the nextval in the sequence counter.
 However, if I attempt to run the same INSERT using the foreign table,
 it always fails complaining that null value in column id violates
 not-null constraint.  It seems like the FDW is somehow ignoring the
 existence of the sequence default value, and rewriting the SQL query
 to explicitly attempt to insert a NULL value.

 Yeah, there was quite a bit of discussion about that back in February or
 so.  The short of it is that column default values that are defined on the
 foreign server are not respected by operations on a foreign table; rather,
 you have to attach a DEFAULT specification to the foreign table definition
 if you want inserts into the foreign table to use that default.

 The default expression is executed locally, too, which means that if you'd
 like it to read like nextval('some_seq') then some_seq has to be a local
 sequence, not one on the foreign server.

Is there some elegant mechanism for keeping the local  foreign
sequences in sync?


 I realize that this isn't ideal for serial-like columns, but honoring
 default expressions that would execute on the foreign server turned out
 to be a huge can of worms.  We might figure out how to fix that some day;
 but if we'd insisted on a solution now, there wouldn't be writable foreign
 tables at all in 9.3.

Understood.  Other than reading the code, is there somewhere that
these limitations are documented that I overlooked?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] partitioned table + postgres_FDW not working in 9.3

2013-09-24 Thread Lonni J Friedman

Greetings,
I've got two different 9.3 clusters setup, a  b (on Linux if that
matters).  On cluster b, I have a table (nppsmoke) that is partitioned
by date (month), which uses a function which is called by a trigger to
manage INSERTS (exactly as documented in the official documentation
for partitioning of tables).  I've setup a postgres foreign data
wrapper server on cluster a which points to cluster b, and then setup
a foreign table (nppsmoke) on cluster a which points to the actual
partitioned (nppsmoke) table on cluster b.  The partitions on cluster
b use the naming scheme nppsmoke_$_$MM (where Y=4 digit year,
and M=2 digit month).  For example, the current month's partition is
named nppsmoke_2013_09 .

The problem that I'm experiencing is if I attempt to perform an INSERT
on the foreign nppsmoke table on cluster a, it fails claiming that the
table partition which should hold the data in the INSERT does not
exist:

ERROR:  relation nppsmoke_2013_09 does not exist
CONTEXT:  Remote SQL command: INSERT INTO public.nppsmoke(id,
date_created, last_update, build_type, current_status, info, cudacode,
gpu, subtest, os, osversion, arch, cl, dispvers, branch, pass, fail,
oldfail, newfail, failureslog, totdriver, ddcl, buildid, testdcmd,
pclog, filtercount, filterlog, error) VALUES ($1, $2, $3, $4, $5, $6,
$7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20,
$21, $22, $23, $24, $25, $26, $27, $28)
PL/pgSQL function public.nppsmoke_insert_trigger() line 30 at SQL statement

If I run the same exact SQL INSERT on cluster b (not using the foreign
table), then it works.  So whatever is going wrong seems to be related
to the foreign table.  Initially I thought that perhaps the problem
was that I needed to create all of the partitions as foreign tables on
cluster a, but that doesn't help.

Am I hitting some kind of foreign data wrapper limitation, or am I
doing something wrong?

thanks


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] upgrade from 9.2.x to 9.3 causes significant performance degradation

2013-09-18 Thread Lonni J Friedman

On Wed, Sep 18, 2013 at 2:02 AM, Kevin Grittner kgri...@ymail.com wrote:
 Lonni J Friedman netll...@gmail.com wrote:

 top shows over 90% of the load is in sys space.  vmstat output
 seems to suggest that its CPU bound (or bouncing back  forth):

 Can you run `perf top` during an episode and see what kernel
 functions are using all that CPU?

Oddly, the problem went away on its own yesterday just after 4PM, and
performance has remained 'normal' since that time.  I changed
absolutely nothing.  If/when it returns, I'll certainly capture that
output.


 This looks similar to cases I've seen of THP defrag going wild.
 Did the OS version or configuration change?  Did the PostgreSQL
 memory settings (like shared_buffers) change?

Nothing changed other than the version of postgres.  I re-used the
same postgresql.conf that was in place when running 9.2.x.

Anyway, here are the current THP related settings on the server:
[root@cuda-db7 ~]# grep AnonHugePages /proc/meminfo
AnonHugePages:548864 kB
[root@cuda-db7 ~]# egrep 'trans|thp' /proc/vmstat
nr_anon_transparent_hugepages 272
thp_fault_alloc 129173889
thp_fault_fallback 17462551
thp_collapse_alloc 148437
thp_collapse_alloc_failed 15143
thp_split 242


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] upgrade from 9.2.x to 9.3 causes significant performance degradation

2013-09-18 Thread Lonni J Friedman

On Wed, Sep 18, 2013 at 2:02 AM, Kevin Grittner kgri...@ymail.com wrote:
 Lonni J Friedman netll...@gmail.com wrote:

 top shows over 90% of the load is in sys space.  vmstat output
 seems to suggest that its CPU bound (or bouncing back  forth):

 Can you run `perf top` during an episode and see what kernel
 functions are using all that CPU?

I take back what I said earlier.  While the master is currently back
to normal performance, the two hot standby slaves are still churning
something awful.

If I run 'perf top' on either slave, after a few seconds, these are
consistently the top three in the list:
 84.57%  [kernel]   [k] _spin_lock_irqsave
  6.21%  [unknown]  [.] 0x00659f60
  4.69%  [kernel]   [k] compaction_alloc


 This looks similar to cases I've seen of THP defrag going wild.
 Did the OS version or configuration change?  Did the PostgreSQL
 memory settings (like shared_buffers) change?

I think you're onto something here with respect to THP defrag going
wild.  I set /sys/kernel/mm/transparent_hugepage/defrag to 'never' and
immediately the load dropped on both slaves from over 5.00 to under
1.00.

So this raises the question, is this a kernel bug, or is there some
other solution to the problem?
Also, seems weird that the problem didn't happen until I switched from
9.2 to 9.3.  Is it possible this is somehow related to the change from
using SysV shared memory to using Posix shared memory and mmap for
memory management?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] upgrade from 9.2.x to 9.3 causes significant performance degradation

2013-09-17 Thread Lonni J Friedman

Greetings,
I'm running a PostgreSQL 9.3.0 cluster (1 master with two streaming
replication hot standby slaves) on RHEL6-x86_64.  Yesterday I upgraded
from 9.2.4 to 9.3.0, and since the upgrade I'm seeing a significant
performance degradation.  PostgreSQL simply feels slower.  Nothing
other than the version of PostgreSQL changed yesterday.  I used
pg_upgrade to perform the upgrade, and ran the generated
analyze_new_cluster.sh immediately afterwards, which completed
successfully.

Prior to the upgrade, I'd generally expect a load average of less than
2.00 on the master, and less than 1.00 on each of the slaves.  Since
the upgrade, the load average on the master has been in double digits
(hitting 100.00 for a few minutes), and the slaves are consistently
above 5.00.

There are a few things that are jumping out at me as behaving
differently since the upgrade.  vmstat processes waiting for runtime
counts have increased dramatically.  Prior to the upgrade the process
count would be consistently less than 10, however since upgrading it
hovers between 40  60 at all times.  /proc/interrupts Local timer
interrupts has increased dramatically as well.  It used to hover
around 6000 and is now over 20k much of the time.  However, I'm
starting to suspect that they are both symptoms of the problem rather
than the cause.

At this point, I'm looking for guidance on how to debug this problem
more effectively.

thanks


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] upgrade from 9.2.x to 9.3 causes significant performance degradation

2013-09-17 Thread Lonni J Friedman

On Tue, Sep 17, 2013 at 9:54 AM, Eduardo Morras emorr...@yahoo.es wrote:
 On Tue, 17 Sep 2013 09:19:29 -0700
 Lonni J Friedman netll...@gmail.com wrote:

 Greetings,
 I'm running a PostgreSQL 9.3.0 cluster (1 master with two streaming
 replication hot standby slaves) on RHEL6-x86_64.  Yesterday I upgraded
 from 9.2.4 to 9.3.0, and since the upgrade I'm seeing a significant
 performance degradation.  PostgreSQL simply feels slower.  Nothing
 other than the version of PostgreSQL changed yesterday.  I used
 pg_upgrade to perform the upgrade, and ran the generated
 analyze_new_cluster.sh immediately afterwards, which completed
 successfully.

 Prior to the upgrade, I'd generally expect a load average of less than
 2.00 on the master, and less than 1.00 on each of the slaves.  Since
 the upgrade, the load average on the master has been in double digits
 (hitting 100.00 for a few minutes), and the slaves are consistently
 above 5.00.

 There are a few things that are jumping out at me as behaving
 differently since the upgrade.  vmstat processes waiting for runtime
 counts have increased dramatically.  Prior to the upgrade the process
 count would be consistently less than 10, however since upgrading it
 hovers between 40  60 at all times.  /proc/interrupts Local timer
 interrupts has increased dramatically as well.  It used to hover
 around 6000 and is now over 20k much of the time.  However, I'm
 starting to suspect that they are both symptoms of the problem rather
 than the cause.

 At this point, I'm looking for guidance on how to debug this problem
 more effectively.

 Don't know what happens but:

 a) Does analyze_new_cluster.sh include a reindex? If not, indexs are useless 
 because analyze statistics says so.

No, it doesn't include a reindex.  It merely invokes vacuumdb --all
--analyze-only with different values for default_statistics_target=1
-c vacuum_cost_delay=0.

According to the documentation for pg_upgrade, post-upgrade scripts to
rebuild tables and indexes will be generated automatically.  Nothing
was generated for this purpose, at least not in any obvious place.
The analyze_new_cluster.sh script is the only one that was
automatically generated as far as I can tell.

 b) Did you configure postgresql.conf on 9.3.0 for your server/load? Perhaps 
 it has default install values.

Yes, I'm using the same postgresql.conf as I was using when running
9.2.4.  Its definitely not running with default install values.

 c) What does logs say?

The postgres server logs look perfectly normal, minus a non-trivial
slower run time for most queries.  There's nothing unusual in any of
the OS level logs (/var/log/messages, etc) or dmesg.



Do you have any other suggestions?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] upgrade from 9.2.x to 9.3 causes significant performance degradation

2013-09-17 Thread Lonni J Friedman

Thanks for your reply.  Comments/answers inline below

On Tue, Sep 17, 2013 at 11:28 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Tue, Sep 17, 2013 at 11:22 AM, Lonni J Friedman netll...@gmail.com
 wrote:


  c) What does logs say?

 The postgres server logs look perfectly normal, minus a non-trivial
 slower run time for most queries.  There's nothing unusual in any of
 the OS level logs (/var/log/messages, etc) or dmesg.


 Are you generally CPU limited or IO limited?

top shows over 90% of the load is in sys space.  vmstat output seems
to suggest that its CPU bound (or bouncing back  forth):

procs ---memory-- ---swap-- -io --system-- -cpu-
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 1  0  17308 852016 141104 12707419200101800
6  4 90  0  0
 0  0  17308 872316 141104 12707420000 0   988  940  564
1  0 99  0  0
 0  0  17308 884288 141104 12707420800 0  1921 1202 2132
1  0 99  0  0
 0  0  17308 898728 141104 12707420800 0 0 1064  577
1  0 99  0  0
 2  0  17308 914920 141104 12707422400 044  820  427
1  0 99  0  0
 0  0  17308 926524 141104 12707427200 048 1173  585
1  0 99  0  0
108  1  17308 753648 141104 12707422400 0   236 9825 3901
12  5 83  0  0
50  0  17308 723156 141104 12707440000 0   144 43481 9105
20 79  1  0  0
45  0  17308 722860 141104 12707441600 0 8 32969 1998
1 97  2  0  0
47  0  17308 738996 141104 12707441600 0 0 34099 1739
1 99  0  0  0
101  0  17308 770220 141104 12707448000 032 38550 5998
 7 93  0  0  0
101  0  17308 775732 141104 12707451200 0   156 33889 5809
 4 96  0  0  0
99  0  17308 791232 141104 12707454400 0 0 32385 4981
0 100  0  0  0
96  0  17308 803156 141104 12707454400 024 32413 4824
0 100  0  0  0
87  0  17308 811624 141104 12707454400 0 0 32438 4470
0 100  0  0  0
83  0  17308 815500 141104 12707454400 0 0 32489 4159
0 100  0  0  0
80  0  17308 826572 141104 12707455200 033 32582 3948
0 100  0  0  0
73  0  17308 853264 141108 12707455200 052 32833 3840
0 100  0  0  0
73  0  17308 882240 141108 12707456000 0 4 32820 3594
0 100  0  0  0
72  0  17308 892256 141108 12707456000 0 0 32368 3516
0 100  0  0  0
###

iostat consistently shows %util under 1.00 which also suggests that
disk IO is not the bottleneck:
#
iostat -dx /dev/sdb 5
Linux 2.6.32-358.6.2.el6.x86_64 (cuda-db7) 09/17/2013 _x86_64_
   (32 CPU)

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sdb   0.02 0.216.91   31.33   651.60  1121.85
46.38 0.092.25   0.08   0.31

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sdb   0.00 0.000.008.00 0.0093.00
11.62 0.000.28   0.20   0.16

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sdb   0.00 0.000.00   11.00 0.00   125.40
11.40 0.000.16   0.16   0.18

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sdb   0.00 0.000.00  105.00 0.00  3380.40
32.19 0.292.76   0.03   0.34

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sdb   0.00 0.000.00   14.80 0.00  2430.60
164.23 0.000.12   0.09   0.14

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sdb   0.00 1.200.00   41.60 0.00  1819.40
43.74 0.020.45   0.05   0.20

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sdb   0.00 0.000.002.80 0.0032.00
11.43 0.000.00   0.00   0.00
#

mpstat also shows a virtually 0 iowait, with a ton of sys (CPU) time:
#
mpstat 2 10
Linux 2.6.32-358.6.2.el6.x86_64 (cuda-db7) 09/17/2013 _x86_64_
   (32 CPU)

12:53:19 PM  CPU%usr   %nice%sys %iowait%irq   %soft
%steal  %guest   %idle
12:53:21 PM  all7.360.00   92.580.000.000.03
0.000.000.03
12:53:23 PM  all6.350.00   90.430.000.000.03
0.000.003.19
12:53:25 PM  all3.130.00   68.200.000.000.02
0.000.00   28.66
12:53:27 PM  all6.070.00   68.460.000.000.03
0.000.00   25.44
12:53:29 PM  all5.830.00   94.140.000.000.03
0.000.000.00
12:53:31 PM  all5.750.00   94.140.000.000.11
0.000.000.00
12:53:33 PM  all7.650.00   40.32

Re: [GENERAL] upgrade from 9.2.x to 9.3 causes significant performance degradation

2013-09-17 Thread Lonni J Friedman

On Tue, Sep 17, 2013 at 3:47 PM, Andres Freund and...@2ndquadrant.com wrote:
 Hi,

 On 2013-09-17 09:19:29 -0700, Lonni J Friedman wrote:
 I'm running a PostgreSQL 9.3.0 cluster (1 master with two streaming
 replication hot standby slaves) on RHEL6-x86_64.  Yesterday I upgraded
 from 9.2.4 to 9.3.0, and since the upgrade I'm seeing a significant
 performance degradation.  PostgreSQL simply feels slower.  Nothing
 other than the version of PostgreSQL changed yesterday.  I used
 pg_upgrade to perform the upgrade, and ran the generated
 analyze_new_cluster.sh immediately afterwards, which completed
 successfully.

 Where did you get 9.3.0 from? Compiled it yourself? Any chance you
 compile with --enable-cassert or somesuch?

Directly from http://yum.postgresql.org.  So unless the RPMs on there
are built weird/wrong, I don't think that's the problem.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] WAL Replication Working but Not Working

2013-08-21 Thread Lonni J Friedman

The first thing to do is look at your server logs around the time when
it stopped working.

On Wed, Aug 21, 2013 at 7:08 AM, Joseph Marlin jmar...@saucontech.com wrote:
 We're having an issue with our warm standby server. About 9:30 last night, it 
 stopped applying changes it received in WAL files that are shipped over to it 
 as they are created. It is still reading WAL files as they delivered, as the 
 startup_log.txt shows, but the changes in the primary database aren't 
 actually being made to the standby, and haven't been since last night.

 Is there any way we can figure out what is going on here? We'd like to 
 recover somehow without having to restore from a base backup, and we'd like 
 to figure out what is happening so we can prevent it in the future.

 Thanks!


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-15 Thread Lonni J Friedman

I've never seen this happen.  Looks like you might be using 9.1?  Are
you up to date on all the 9.1.x releases?

Do you have just 1 slave syncing from the master?
Which OS are you using?
Did you verify that there aren't any network problems between the
slave  master?
Or hardware problems (like the NIC dying, or dropping packets)?


On Thu, Aug 15, 2013 at 11:07 AM, Andrew Berman rexx...@gmail.com wrote:
 Hello,

 I'm having an issue where streaming replication just randomly stops working.
 I haven't been able to find anything in the logs which point to an issue,
 but the Postgres process shows a waiting status on the slave:

 postgres  5639  0.1 24.3 3428264 2970236 ? Ss   Aug14   1:54 postgres:
 startup process   recovering 0001053D003F waiting
 postgres  5642  0.0 21.4 3428356 2613252 ? Ss   Aug14   0:30 postgres:
 writer process
 postgres  5659  0.0  0.0 177524   788 ?Ss   Aug14   0:03 postgres:
 stats collector process
 postgres  7159  1.2  0.1 3451360 18352 ?   Ss   Aug14  17:31 postgres:
 wal receiver process   streaming 549/216B3730

 The replication works great for days, but randomly seems to lock up and
 replication halts.  I verified that the two databases were out of sync with
 a query on both of them.  Has anyone experienced this issue before?

 Here are some relevant config settings:

 Master:

 wal_level = hot_standby
 checkpoint_segments = 32
 checkpoint_completion_target = 0.9
 archive_mode = on
 archive_command = 'rsync -a %p foo@foo:/var/lib/pgsql/9.1/wals/%f
 /dev/null'
 max_wal_senders = 2
 wal_keep_segments = 32

 Slave:

 wal_level = hot_standby
 checkpoint_segments = 32
 #checkpoint_completion_target = 0.5
 hot_standby = on
 max_standby_archive_delay = -1
 max_standby_streaming_delay = -1
 #wal_receiver_status_interval = 10s
 #hot_standby_feedback = off

 Thank you for any help you can provide!

 Andrew




-- 
~
L. Friedmannetll...@gmail.com
LlamaLand   https://netllama.linux-sxs.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-15 Thread Lonni J Friedman

Are you certain that there are no relevant errors in the database logs
(on both master  slave)?  Also, are you sure that you didn't
misconfigure logging such that errors wouldn't appear?

On Thu, Aug 15, 2013 at 11:45 AM, Andrew Berman rexx...@gmail.com wrote:
 Hi Lonni,

 Yes, I am using PG 9.1.9.
 Yes, 1 slave syncing from the master
 CentOS 6.4
 I don't see any network or hardware issues (e.g. NIC) but will look more
 into this.  They are communicating on a private network and switch.

 I forgot to mention that after I restart the slave, everything syncs right
 back up and all if working again so if it is a network issue, the
 replication is just stopping after some hiccup instead of retrying and
 resuming when things are back up.

 Thanks!



 On Thu, Aug 15, 2013 at 11:32 AM, Lonni J Friedman netll...@gmail.com
 wrote:

 I've never seen this happen.  Looks like you might be using 9.1?  Are
 you up to date on all the 9.1.x releases?

 Do you have just 1 slave syncing from the master?
 Which OS are you using?
 Did you verify that there aren't any network problems between the
 slave  master?
 Or hardware problems (like the NIC dying, or dropping packets)?


 On Thu, Aug 15, 2013 at 11:07 AM, Andrew Berman rexx...@gmail.com wrote:
  Hello,
 
  I'm having an issue where streaming replication just randomly stops
  working.
  I haven't been able to find anything in the logs which point to an
  issue,
  but the Postgres process shows a waiting status on the slave:
 
  postgres  5639  0.1 24.3 3428264 2970236 ? Ss   Aug14   1:54
  postgres:
  startup process   recovering 0001053D003F waiting
  postgres  5642  0.0 21.4 3428356 2613252 ? Ss   Aug14   0:30
  postgres:
  writer process
  postgres  5659  0.0  0.0 177524   788 ?Ss   Aug14   0:03
  postgres:
  stats collector process
  postgres  7159  1.2  0.1 3451360 18352 ?   Ss   Aug14  17:31
  postgres:
  wal receiver process   streaming 549/216B3730
 
  The replication works great for days, but randomly seems to lock up and
  replication halts.  I verified that the two databases were out of sync
  with
  a query on both of them.  Has anyone experienced this issue before?
 
  Here are some relevant config settings:
 
  Master:
 
  wal_level = hot_standby
  checkpoint_segments = 32
  checkpoint_completion_target = 0.9
  archive_mode = on
  archive_command = 'rsync -a %p foo@foo:/var/lib/pgsql/9.1/wals/%f
  /dev/null'
  max_wal_senders = 2
  wal_keep_segments = 32
 
  Slave:
 
  wal_level = hot_standby
  checkpoint_segments = 32
  #checkpoint_completion_target = 0.5
  hot_standby = on
  max_standby_archive_delay = -1
  max_standby_streaming_delay = -1
  #wal_receiver_status_interval = 10s
  #hot_standby_feedback = off
 
  Thank you for any help you can provide!
 
  Andrew
 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-15 Thread Lonni J Friedman

I'd suggest enhancing your logging to include time/datestamps for
every entry, and also the client hostname.  That will help to rule
in/out those 'unexpected EOF' errors.

On Thu, Aug 15, 2013 at 12:22 PM, Andrew Berman rexx...@gmail.com wrote:
 The only thing I see that is a possibility for the issue is in the slave
 log:

 LOG:  unexpected EOF on client connection
 LOG:  could not receive data from client: Connection reset by peer

 I don't know if that's related or not as it could just be somebody running a
 query.  The log file does seem to be riddled with these but the replication
 failures don't happen constantly.

 As far as I know I'm not swallowing any errors.  The logging is all set as
 the default:

 log_destination = 'stderr'
 logging_collector = on
 #client_min_messages = notice
 #log_min_messages = warning
 #log_min_error_statement = error
 #log_min_duration_statement = -1
 #log_checkpoints = off
 #log_connections = off
 #log_disconnections = off
 #log_error_verbosity = default

 I'm going to have a look at the NICs to make sure there's no issue there.

 Thanks again for your help!


 On Thu, Aug 15, 2013 at 11:51 AM, Lonni J Friedman netll...@gmail.com
 wrote:

 Are you certain that there are no relevant errors in the database logs
 (on both master  slave)?  Also, are you sure that you didn't
 misconfigure logging such that errors wouldn't appear?

 On Thu, Aug 15, 2013 at 11:45 AM, Andrew Berman rexx...@gmail.com wrote:
  Hi Lonni,
 
  Yes, I am using PG 9.1.9.
  Yes, 1 slave syncing from the master
  CentOS 6.4
  I don't see any network or hardware issues (e.g. NIC) but will look more
  into this.  They are communicating on a private network and switch.
 
  I forgot to mention that after I restart the slave, everything syncs
  right
  back up and all if working again so if it is a network issue, the
  replication is just stopping after some hiccup instead of retrying and
  resuming when things are back up.
 
  Thanks!
 
 
 
  On Thu, Aug 15, 2013 at 11:32 AM, Lonni J Friedman netll...@gmail.com
  wrote:
 
  I've never seen this happen.  Looks like you might be using 9.1?  Are
  you up to date on all the 9.1.x releases?
 
  Do you have just 1 slave syncing from the master?
  Which OS are you using?
  Did you verify that there aren't any network problems between the
  slave  master?
  Or hardware problems (like the NIC dying, or dropping packets)?
 
 
  On Thu, Aug 15, 2013 at 11:07 AM, Andrew Berman rexx...@gmail.com
  wrote:
   Hello,
  
   I'm having an issue where streaming replication just randomly stops
   working.
   I haven't been able to find anything in the logs which point to an
   issue,
   but the Postgres process shows a waiting status on the slave:
  
   postgres  5639  0.1 24.3 3428264 2970236 ? Ss   Aug14   1:54
   postgres:
   startup process   recovering 0001053D003F waiting
   postgres  5642  0.0 21.4 3428356 2613252 ? Ss   Aug14   0:30
   postgres:
   writer process
   postgres  5659  0.0  0.0 177524   788 ?Ss   Aug14   0:03
   postgres:
   stats collector process
   postgres  7159  1.2  0.1 3451360 18352 ?   Ss   Aug14  17:31
   postgres:
   wal receiver process   streaming 549/216B3730
  
   The replication works great for days, but randomly seems to lock up
   and
   replication halts.  I verified that the two databases were out of
   sync
   with
   a query on both of them.  Has anyone experienced this issue before?
  
   Here are some relevant config settings:
  
   Master:
  
   wal_level = hot_standby
   checkpoint_segments = 32
   checkpoint_completion_target = 0.9
   archive_mode = on
   archive_command = 'rsync -a %p foo@foo:/var/lib/pgsql/9.1/wals/%f
   /dev/null'
   max_wal_senders = 2
   wal_keep_segments = 32
  
   Slave:
  
   wal_level = hot_standby
   checkpoint_segments = 32
   #checkpoint_completion_target = 0.5
   hot_standby = on
   max_standby_archive_delay = -1
   max_standby_streaming_delay = -1
   #wal_receiver_status_interval = 10s
   #hot_standby_feedback = off
  
   Thank you for any help you can provide!
  
   Andrew
  





-- 
~
L. Friedmannetll...@gmail.com
LlamaLand   https://netllama.linux-sxs.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] postgres FDW cost estimation options unrecognized in 9.3-beta1

2013-07-26 Thread Lonni J Friedman

Greetings,
I have a postgresql-9.3-beta1 cluster setup (from the
yum.postgresql.org RPMs), where I'm experimenting with the postgres
FDW extension.  The documentation (
http://www.postgresql.org/docs/9.3/static/postgres-fdw.html )
references three Cost Estimation Options which can be set for a
foreign table or a foreign server.  However when I attempt to set
them, I always get an error that the option is not found:
###
nightly=# show SERVER_VERSION ;
 server_version

 9.3beta1

nightly=# \des+
   List of
foreign servers
   Name|   Owner   | Foreign-data wrapper | Access privileges |
Type | Version |
  FDW Options| Description
---+---+--+---+--+-+--
-+-
 cuda_db10 | lfriedman | postgres_fdw |   |
  | | (host 'cuda-db10', dbname 'nightly', port '5432') |
(1 row)

nightly=# ALTER SERVER cuda_db10 OPTIONS (SET use_remote_estimate 'true') ;
ERROR:  option use_remote_estimate not found
###

Am I doing something wrong, or is this a bug?

thanks


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgres FDW cost estimation options unrecognized in 9.3-beta1

2013-07-26 Thread Lonni J Friedman

On Fri, Jul 26, 2013 at 3:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Lonni J Friedman netll...@gmail.com writes:
 nightly=# ALTER SERVER cuda_db10 OPTIONS (SET use_remote_estimate 'true') ;
 ERROR:  option use_remote_estimate not found

 Am I doing something wrong, or is this a bug?

 [ experiments... ]  You need to say ADD, not SET, to add a new option to
 the list.  SET might more appropriately be spelled REPLACE, because it
 requires that the object already have a defined value for the option,
 which will be replaced.

 Our documentation appears not to disclose this fine point, but a look
 at the SQL-MED standard says it's operating per spec.  The standard also
 says that ADD is an error if the option is already defined, which is a
 bit more defensible, but still not exactly what I'd call user-friendly.
 And the error we issue for that case is pretty misleading too:

 regression=# ALTER SERVER cuda_db10 OPTIONS (use_remote_estimate 'true') ;
 ALTER SERVER
 regression=# ALTER SERVER cuda_db10 OPTIONS (use_remote_estimate 'false') ;
 ERROR:  option use_remote_estimate provided more than once

 I think we could do with both more documentation, and better error
 messages for these cases.  In the SET-where-you-should-use-ADD case,
 perhaps

 ERROR:  option use_remote_estimate has not been set
 HINT: Use ADD not SET to define an option that wasn't already set.

 In the ADD-where-you-should-use-SET case, perhaps

 ERROR:  option use_remote_estimate is already set
 HINT: Use SET not ADD to change an option's value.

 The provided more than once wording would be appropriate if the same
 option is specified more than once in the command text, but I'm not sure
 that it's worth the trouble to detect that case.

 Thoughts, better wordings?

Thanks Tom, I've confirmed that using ADD was the solution.  I think
your suggested updated ERROR  HINT text is an excellent improvement.
It definitely would have given me the clue I was missing earlier.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] upgrading from 9.3-beta1 to 9.3-beta2 requires dump reload?

2013-07-24 Thread Lonni J Friedman

Greetings,
I just got around to upgrading from 9.3-beta1 to 9.3-beta2, and was
surprised to see that the server was refusing to start. In the log,
I'm seeing:

2013-07-24 13:41:47 PDT [7083]: [1-1] db=,user= FATAL:  database files
are incompatible with server
2013-07-24 13:41:47 PDT [7083]: [2-1] db=,user= DETAIL:  The database
cluster was initialized with CATALOG_VERSION_NO 201305061, but the
server was compiled with CATALOG_VERSION_NO 201306121.
2013-07-24 13:41:47 PDT [7083]: [3-1] db=,user= HINT:  It looks like
you need to initdb.


I'm using the RPMs from yum.postgresql.org on RHEL6.  Is this
expected, intentional behavior?  Do I really need to dump  reload to
upgrade between beta releases of 9.3, or is there some more efficient
way?

thanks


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] upgrading from 9.3-beta1 to 9.3-beta2 requires dump reload?

2013-07-24 Thread Lonni J Friedman

On Wed, Jul 24, 2013 at 2:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Lonni J Friedman escribió:
 I'm using the RPMs from yum.postgresql.org on RHEL6.  Is this
 expected, intentional behavior?  Do I really need to dump  reload to
 upgrade between beta releases of 9.3, or is there some more efficient
 way?

 We try to avoid forcing initdb between beta versions, but it's not
 guaranteed.  You should be able to use pg_upgrade, also.

 Unfortunately, the RPMs probably won't be very helpful for using
 pg_upgrade, since there's no convenient way to get beta1 and beta2
 postmaster executables installed at the same time (unless Devrim
 foresaw this case and packaged things differently than I did for
 Red Hat ;-)).

Sounds like I'm out of luck.  Thanks anyway.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Standby stopped working after PANIC: WAL contains references to invalid pages

2013-06-22 Thread Lonni J Friedman

Looks like some kind of data corruption.  Question is whether it came
from the master, or was created by the standby.  If you re-seed the
standby with a full (base) backup, does the problem go away?

On Sat, Jun 22, 2013 at 12:43 PM, Dan Kogan d...@iqtell.com wrote:
 Hello,



 Today our standby instance stopped working with this error in the log:



 2013-06-22 16:27:32 UTC [8367]: [247-1] [] WARNING:  page 158130 of relation
 pg_tblspc/16447/PG_9.2_201204301/16448/39154429 is uninitialized

 2013-06-22 16:27:32 UTC [8367]: [248-1] [] CONTEXT:  xlog redo vacuum: rel
 16447/16448/39154429; blk 158134, lastBlockVacuumed 158129

 2013-06-22 16:27:32 UTC [8367]: [249-1] [] PANIC:  WAL contains references
 to invalid pages

 2013-06-22 16:27:32 UTC [8367]: [250-1] [] CONTEXT:  xlog redo vacuum: rel
 16447/16448/39154429; blk 158134, lastBlockVacuumed 158129

 2013-06-22 16:27:32 UTC [8366]: [3-1] [] LOG:  startup process (PID 8367)
 was terminated by signal 6: Aborted

 2013-06-22 16:27:32 UTC [8366]: [4-1] [] LOG:  terminating any other active
 server processes



 After re-start the same exact error occurred.



 We thought that maybe we hit this bug -
 http://postgresql.1045698.n5.nabble.com/Completely-broken-replica-after-PANIC-WAL-contains-references-to-invalid-pages-td5750072.html.

 However, there is nothing in our log about sub-transactions, so it didn't
 seem the same to us.



 Any advice on how to further debug this so we can avoid this in the future
 is appreciated.



 Environment:



 AWS, High I/O instance (hi1.4xlarge), 60GB RAM



 Software and settings:



 PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro
 4.5.2-8ubuntu4) 4.5.2, 64-bit



 archive_command  rsync -a %p
 slave:/var/lib/postgresql/replication_load/%f

 archive_mode   on

 autovacuum_freeze_max_age 10

 autovacuum_max_workers6

 checkpoint_completion_target 0.9

 checkpoint_segments   128

 checkpoint_timeout   30min

 default_text_search_config   pg_catalog.english

 hot_standby  on

 lc_messages  en_US.UTF-8

 lc_monetary  en_US.UTF-8

 lc_numeric  en_US.UTF-8

 lc_time en_US.UTF-8

 listen_addresses  *

 log_checkpoints   on

 log_destinationstderr

 log_line_prefix %t [%p]: [%l-1] [%h]

 log_min_duration_statement-1

 log_min_error_statement   error

 log_min_messages error

 log_timezoneUTC

 maintenance_work_mem   1GB

 max_connections1200

 max_standby_streaming_delay90s

 max_wal_senders   5

 port   5432

 random_page_cost2

 seq_page_cost 1

 shared_buffers4GB

 ssl   off

 ssl_cert_file   /etc/ssl/certs/ssl-cert-snakeoil.pem

 ssl_key_file/etc/ssl/private/ssl-cert-snakeoil.key

 synchronous_commitoff

 TimeZoneUTC

 wal_keep_segments 128

 wal_level hot_standby

 work_mem8MB



 root@ip-10-148-131-236:~# /usr/local/pgsql/bin/pg_controldata
 /usr/local/pgsql/data

 pg_control version number:922

 Catalog version number:   201204301

 Database system identifier:   5838668587531239413

 Database cluster state:   in archive recovery

 pg_control last modified: Sat 22 Jun 2013 06:13:07 PM UTC

 Latest checkpoint location:   2250/18CA0790

 Prior checkpoint location:2250/18CA0790

 Latest checkpoint's REDO location:224F/E127B078

 Latest checkpoint's TimeLineID:   2

 Latest checkpoint's full_page_writes: on

 Latest checkpoint's NextXID:  1/2018629527

 Latest checkpoint's NextOID:  43086248

 Latest checkpoint's NextMultiXactId:  7088726

 Latest checkpoint's NextMultiOffset:  20617234

 Latest checkpoint's oldestXID:1690316999

 Latest checkpoint's oldestXID's DB:   16448

 Latest checkpoint's oldestActiveXID:  2018629527

 Time of latest checkpoint:Sat 22 Jun 2013 03:24:05 PM UTC

 Minimum recovery ending location: 2251/5EA631F0

 Backup start location:0/0

 Backup end location:  0/0

 End-of-backup record required:no

 Current wal_level setting:hot_standby

 Current max_connections setting:  1200

 Current max_prepared_xacts setting:   0

 Current max_locks_per_xact setting:   64

 Maximum data alignment:   8

 Database block size:  8192

 Blocks per segment of large relation: 131072

 WAL block size:   8192

 Bytes per WAL segment:16777216

 Maximum length of identifiers:64

 Maximum columns in an index:  32

 Maximum size of a TOAST chunk:1996

 Date/time type storage:   64-bit integers

 Float4 argument passing:  by value

 Float8 argument passing:  by value

 root@ip-10-148-131-236:~#



 Thanks again.



 Dan



--

Re: [GENERAL] Standby stopped working after PANIC: WAL contains references to invalid pages

2013-06-22 Thread Lonni J Friedman

Assuming that you still have $PGDATA from the broken instance (such
that you can reproduce the crash again), there might be a way to debug
it further.  I'd guess that something like bad RAM or storage could
cause an index to get corrupted in this fashion, but the fact that
you're using AWS makes that less likely.  Someone far more
knowledgeable than I will need to provide guidance on how to debug
this though.

On Sat, Jun 22, 2013 at 4:17 PM, Dan Kogan d...@iqtell.com wrote:
 Re-seeding the standby with a full base backup does seem to make the error go 
 away.
 The standby started, caught up and has been working for about 2 hours.

 The file in the error message was an index.  We rebuilt it just in case.
 Is there any way to debug the issue at this point?



 -Original Message-
 From: Lonni J Friedman [mailto:netll...@gmail.com]
 Sent: Saturday, June 22, 2013 4:11 PM
 To: Dan Kogan
 Cc: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Standby stopped working after PANIC: WAL contains 
 references to invalid pages

 Looks like some kind of data corruption.  Question is whether it came from 
 the master, or was created by the standby.  If you re-seed the standby with a 
 full (base) backup, does the problem go away?

 On Sat, Jun 22, 2013 at 12:43 PM, Dan Kogan d...@iqtell.com wrote:
 Hello,



 Today our standby instance stopped working with this error in the log:



 2013-06-22 16:27:32 UTC [8367]: [247-1] [] WARNING:  page 158130 of
 relation
 pg_tblspc/16447/PG_9.2_201204301/16448/39154429 is uninitialized

 2013-06-22 16:27:32 UTC [8367]: [248-1] [] CONTEXT:  xlog redo vacuum:
 rel 16447/16448/39154429; blk 158134, lastBlockVacuumed 158129

 2013-06-22 16:27:32 UTC [8367]: [249-1] [] PANIC:  WAL contains
 references to invalid pages

 2013-06-22 16:27:32 UTC [8367]: [250-1] [] CONTEXT:  xlog redo vacuum:
 rel 16447/16448/39154429; blk 158134, lastBlockVacuumed 158129

 2013-06-22 16:27:32 UTC [8366]: [3-1] [] LOG:  startup process (PID
 8367) was terminated by signal 6: Aborted

 2013-06-22 16:27:32 UTC [8366]: [4-1] [] LOG:  terminating any other
 active server processes



 After re-start the same exact error occurred.



 We thought that maybe we hit this bug -
 http://postgresql.1045698.n5.nabble.com/Completely-broken-replica-after-PANIC-WAL-contains-references-to-invalid-pages-td5750072.html.

 However, there is nothing in our log about sub-transactions, so it
 didn't seem the same to us.



 Any advice on how to further debug this so we can avoid this in the
 future is appreciated.



 Environment:



 AWS, High I/O instance (hi1.4xlarge), 60GB RAM



 Software and settings:



 PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc
 (Ubuntu/Linaro
 4.5.2-8ubuntu4) 4.5.2, 64-bit



 archive_command  rsync -a %p
 slave:/var/lib/postgresql/replication_load/%f

 archive_mode   on

 autovacuum_freeze_max_age 10

 autovacuum_max_workers6

 checkpoint_completion_target 0.9

 checkpoint_segments   128

 checkpoint_timeout   30min

 default_text_search_config   pg_catalog.english

 hot_standby  on

 lc_messages  en_US.UTF-8



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] 9.3-beta postgres-fdw COPY error

2013-06-21 Thread Lonni J Friedman

Greetings,
I'm trying to test out the new postgres-fdw support in postgresql-9.3
(beta) in preparation for an upgrade from 9.2 later this year.  So
far, everything is working ok, however one problem I'm encountering is
with the COPY command. When I run it against a foreign table (which is
also in a 9.3 instance), it fails:

COPY my_foreigntbl (id,testname) TO '/tmp/testlist_aid' (DELIMITER ',');
ERROR:  cannot copy from foreign table my_foreigntbl


Is this expected behavior or a bug?

thanks!


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] 9.3-beta postgres-fdw COPY error

2013-06-21 Thread Lonni J Friedman

I was afraid someone would say that.  Is this a limitation that might
be removed in the future (like 9.4), or is there a technical reason
why its not possible to do a COPY against a foreign table?

On Fri, Jun 21, 2013 at 10:52 AM, Adrian Klaver adrian.kla...@gmail.com wrote:
 On 06/21/2013 10:39 AM, Lonni J Friedman wrote:

 Greetings,
 I'm trying to test out the new postgres-fdw support in postgresql-9.3
 (beta) in preparation for an upgrade from 9.2 later this year.  So
 far, everything is working ok, however one problem I'm encountering is
 with the COPY command. When I run it against a foreign table (which is
 also in a 9.3 instance), it fails:

 COPY my_foreigntbl (id,testname) TO '/tmp/testlist_aid' (DELIMITER ',');
 ERROR:  cannot copy from foreign table my_foreigntbl


 Is this expected behavior or a bug?


 Expected I believe:

 http://www.postgresql.org/docs/9.3/static/postgres-fdw.html

 Now you need only SELECT from a foreign table to access the data stored in
 its underlying remote table. You can also modify the remote table using
 INSERT, UPDATE, or DELETE. 


 thanks!




 --
 Adrian Klaver
 adrian.kla...@gmail.com



-- 
~
L. Friedmannetll...@gmail.com
LlamaLand   https://netllama.linux-sxs.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] how to reference variables in pgbench custom scripts?

2013-06-18 Thread Lonni J Friedman

I'm attempting to write a custom pgbench script (called via the -f
option), with a variable set at the top with:
\setrandom aid 100 50875000

However, I can't quite figure out how to reference the new aid
variable.  The documentation simply states that a variable is
referenced with a colon in front of its name (:aid ).  However, I
can't figure out how to make this work if the variable is embedded
between other characters.  For example, I want to do something like
this:

CREATE INDEX nppsmoke_tmp_:aid_idx ON nppsmoke_tmp_:aid (testname);

However, when I run pgbench, it fails:
Client 0 aborted in state 4: ERROR:  syntax error at or near :
LINE 1: CREATE INDEX nppsmoke_tmp_:aid_idx ON nppsmoke_tmp_29954053 ...

Is there a default delimiter, such as bash's curly brackets (${aid}) ?

thanks


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_upgrade link mode

2013-05-16 Thread Lonni J Friedman

On Thu, May 16, 2013 at 11:03 AM, Igor Neyman iney...@perceptron.com wrote:

 From: pgsql-general-ow...@postgresql.org 
 [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of AI Rumman
 Sent: Thursday, May 16, 2013 1:56 PM
 To: Fabio Rueda Carrascosa
 Cc: pgsql-general
 Subject: Re: [GENERAL] pg_upgrade link mode

 I always think its a bit risky to use link instead of copying. However, I'd 
 suggest to try the  --check at first of pg_upgrade.

 --

 Why?
 Do you have specific experience, when link mode caused any problems?
 Could you share?

I assume what he's referring to is if the upgrade gets partially
completed and fails for any reason, then you have a broken mess, with
no simple rollback path.  Since your database is only about 1GB in
size, it shouldn't take very long to run a base backup before doing
the upgrade.  You can send that backup over the network to a remote
system, so that you have a fallback solution if the upgrade fails.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_basebackup, requested WAL has already been removed

2013-05-10 Thread Lonni J Friedman

Its definitely not a bug.  You need to set/increase wal_keep_segments
to a value that ensures that they aren't recycled faster than the time
required to complete the base backup (plus some buffer).

On Fri, May 10, 2013 at 9:48 AM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 Hi,

 I've recently started to use pg_basebackup --xlog-method=stream to backup my
 multi-Tb database.
 Before I did the backup when there was not much activity in the DB and it
 went perfectly fine, but today, I've started the backup and it failed twice
 almost at the same time as the
 CREATE INDEX  (and another time CLUSTER) commands were finished.

 Here:

 postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup
 --xlog-method=stream --progress --verbose --pg
 transaction log start point: 23AE/BD003E70
 pg_basebackup: starting background WAL receiver
 pg_basebackup: unexpected termination of replication stream: FATAL:
 requested WAL segment 000123B100FE has already been removed
 4819820/16816887078 kB (4%), 0/1 tablespace
 (/mnt/backup/wsdb_130510/base/1)

 And the logs from around that time contained:

 some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG:  duration: 40128.163 ms
 statement: CREATE INDEX usno_cle
 an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec));
 ::2013-05-10 14:35:43 BST:25529LOG:  checkpoints are occurring too
 frequently (8 seconds apart)
 ::2013-05-10 14:35:43 BST:25529HINT:  Consider increasing the configuration
 parameter checkpoint_segmen
 ts.
 ::2013-05-10 14:35:51 BST:25529LOG:  checkpoints are occurring too
 frequently (8 seconds apart)
 ::2013-05-10 14:35:51 BST:25529HINT:  Consider increasing the configuration
 parameter checkpoint_segmen
 ts.
 postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL:  requested WAL segment
 000123B100FE has already been removed
 some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG:  duration: 78378.194 ms
 statement: CLUSTER usno_clean_q3c_idx ON usno_clean;

 One the previous occasion when it happened the CREATE INDEX() was being
 executed:

 some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG:  duration: 67.680 ms
 statement: SELECT name FROM  (SELECT pg_catalog.lower(name) AS name FROM
 pg_catalog.pg_settings   UNION ALL SELECT 'session authorization'   UNION
 ALL SELECT 'all') ss  WHERE substring(name,1,4)='rand'
 LIMIT 1000
 ::2013-05-10 09:22:47 BST:25529LOG:  checkpoints are occurring too
 frequently (18 seconds apart)
 ::2013-05-10 09:22:47 BST:25529HINT:  Consider increasing the configuration
 parameter checkpoint_segments.
 postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL:  requested WAL
 segment 000123990040 has already been removed
 some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG:  duration: 542955.262 ms
 statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id);

 The .configuration
 PG 9.2.4, Debian 7.0, amd64

 shared_buffers = 10GB
 work_mem = 1GB
 maintenance_work_mem = 1GB
 effective_io_concurrency = 5
 synchronous_commit = off
 checkpoint_segments = 32
 max_wal_senders = 2
 effective_cache_size = 30GB
 autovacuum_max_workers = 3
 wal_level=archive
 archive_mode = off

 Does it look like a bug or am I missing something ?

 Thanks,
 Sergey


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
~
L. Friedmannetll...@gmail.com
LlamaLand   https://netllama.linux-sxs.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_basebackup, requested WAL has already been removed

2013-05-10 Thread Lonni J Friedman

That's a good point.  Then i dunno, perhaps it is a bug, but I'd be
surprised if this wasn't working, as its not really a corner case that
could be missed in testing, as long as all the options were exercised.
 Hopefully someone else can weigh in.

On Fri, May 10, 2013 at 10:00 AM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:

 On Fri, 10 May 2013, Lonni J Friedman wrote:

 Its definitely not a bug.  You need to set/increase wal_keep_segments
 to a value that ensures that they aren't recycled faster than the time
 required to complete the base backup (plus some buffer).


 But I thought that wal_keep_segments is not needed for the streaming regime
 ( --xlog-method=stream)  And the documentation
 http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
 only mentions wal_keep_segments when talking about --xlog-method=fetch.



 On Fri, May 10, 2013 at 9:48 AM, Sergey Koposov

 kopo...@ast.cam.ac.uk wrote:  Hi,


 I've recently started to use pg_basebackup --xlog-method=stream to backup
 my
 multi-Tb database.
 Before I did the backup when there was not much activity in the DB and it
 went perfectly fine, but today, I've started the backup and it failed
 twice
 almost at the same time as the
 CREATE INDEX  (and another time CLUSTER) commands were finished.

 Here:

 postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup
 --xlog-method=stream --progress --verbose --pg
 transaction log start point: 23AE/BD003E70
 pg_basebackup: starting background WAL receiver
 pg_basebackup: unexpected termination of replication stream: FATAL:
 requested WAL segment 000123B100FE has already been removed
 4819820/16816887078 kB (4%), 0/1 tablespace
 (/mnt/backup/wsdb_130510/base/1)

 And the logs from around that time contained:

 some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG:  duration: 40128.163 ms
 statement: CREATE INDEX usno_cle
 an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec));
 ::2013-05-10 14:35:43 BST:25529LOG:  checkpoints are occurring too
 frequently (8 seconds apart)
 ::2013-05-10 14:35:43 BST:25529HINT:  Consider increasing the
 configuration
 parameter checkpoint_segmen
 ts.
 ::2013-05-10 14:35:51 BST:25529LOG:  checkpoints are occurring too
 frequently (8 seconds apart)
 ::2013-05-10 14:35:51 BST:25529HINT:  Consider increasing the
 configuration
 parameter checkpoint_segmen
 ts.
 postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL:  requested WAL
 segment
 000123B100FE has already been removed
 some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG:  duration: 78378.194 ms
 statement: CLUSTER usno_clean_q3c_idx ON usno_clean;

 One the previous occasion when it happened the CREATE INDEX() was being
 executed:

 some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG:  duration: 67.680 ms
 statement: SELECT name FROM  (SELECT pg_catalog.lower(name) AS name FROM
 pg_catalog.pg_settings   UNION ALL SELECT 'session authorization'   UNION
 ALL SELECT 'all') ss  WHERE substring(name,1,4)='rand'
 LIMIT 1000
 ::2013-05-10 09:22:47 BST:25529LOG:  checkpoints are occurring too
 frequently (18 seconds apart)
 ::2013-05-10 09:22:47 BST:25529HINT:  Consider increasing the
 configuration
 parameter checkpoint_segments.
 postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL:  requested WAL
 segment 000123990040 has already been removed
 some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG:  duration: 542955.262 ms
 statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id);

 The .configuration
 PG 9.2.4, Debian 7.0, amd64

 shared_buffers = 10GB
 work_mem = 1GB
 maintenance_work_mem = 1GB
 effective_io_concurrency = 5
 synchronous_commit = off
 checkpoint_segments = 32
 max_wal_senders = 2
 effective_cache_size = 30GB
 autovacuum_max_workers = 3
 wal_level=archive
 archive_mode = off

 Does it look like a bug or am I missing something ?

 Thanks,
 Sergey


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




 --
 ~
 L. Friedmannetll...@gmail.com
 LlamaLand   https://netllama.linux-sxs.org





-- 
~
L. Friedmannetll...@gmail.com
LlamaLand   https://netllama.linux-sxs.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

2013-05-10 Thread Lonni J Friedman

On Fri, May 10, 2013 at 10:20 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, May 10, 2013 at 12:03 PM, David Boreham david_l...@boreham.org 
 wrote:
 On 5/10/2013 10:21 AM, Merlin Moncure wrote:

 As it turns out the list of flash drives are suitable for database use is
 surprisingly small. The s3700 I noted upthread seems to be specifically
 built with databases in mind and is likely the best choice for new
 deployments. The older Intel 320 is also a good choice. I think that's
 pretty much it until you get into expensive pci-e based gear.


 This may have been a typo : did you mean Intel 710 series rather than 320 ?

 While the 320 has the supercap, it isn't specified for high write endurance.
 Definitely usable for a database, and a better choice than most of the
 alternatives, but I'd have listed the 710 ahead of the 320.

 It wasn't a typo.  The 320 though is perfectly fine although it will
 wear out faster -- so it fills a niche for low write intensity
 applications.  I find the s3700 to be superior to the 710 in just
 about every way (although you're right -- it is suitable for database
 use).

There's also the 520 series, which has better performance than the 320
series (which is EOL now).


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

2013-05-10 Thread Lonni J Friedman

On Fri, May 10, 2013 at 11:23 AM, Steven Schlansker ste...@likeness.com wrote:

 On May 10, 2013, at 7:14 AM, Matt Brock m...@mattbrock.co.uk wrote:

 Hello.

 We're intending to deploy PostgreSQL on Linux with SSD drives which would be 
 in a RAID 1 configuration with Hardware RAID.

 My first question is essentially: are there any issues we need to be aware 
 of when running PostgreSQL 9 on CentOS 6 on a server with SSD drives in a 
 Hardware RAID 1 configuration? Will there be any compatibility problems 
 (seems unlikely)? Should we consider alternative configurations as being 
 more effective for getting better use out of the hardware?

 The second question is: are there any SSD-specific issues to be aware of 
 when tuning PostgreSQL to make the best use of this hardware and software?


 A couple of things I noticed with a similar-ish setup:

 * Some forms of RAID / LVM break the kernel's automatic disk tuning 
 mechanism.  In particular, there is a rotational tunable that often does 
 not get set right.  You might end up tweaking read ahead and friends as well.
 http://www.mjmwired.net/kernel/Documentation/block/queue-sysfs.txt#112

 * The default Postgres configuration is awful for a SSD backed database.  You 
 really need to futz with checkpoints to get acceptable throughput.
 The PostgreSQL 9.0 High Performance book is fantastic and is what I used to 
 great success.

 * The default Linux virtual memory configuration is awful for this 
 configuration.  Briefly, it will accept a ton of incoming data, and then go 
 through an awful stall as soon as it calls fsync() to write all that data to 
 disk.  We had multi-second delays all the way through to the application 
 because of this.  We had to change the zone_reclaim_mode and the dirty buffer 
 limits.
 http://www.postgresql.org/message-id/500616cb.3070...@2ndquadrant.com



 I am not sure that these numbers will end up being anywhere near what works 
 for you, but these are my notes from tuning a 4xMLC SSD RAID-10.  I haven't 
 proven that this is optimal, but it was way better than the defaults.  We 
 ended up with the following list of changes:

 * Change IO scheduler to noop
 * Mount DB volume with nobarrier, noatime
 * Turn blockdev readahead to 16MiB
 * Turn sdb's rotational tuneable to 0

 PostgreSQL configuration changes:
 synchronous_commit = off
 effective_io_concurrency = 4
 checkpoint_segments = 1024
 checkpoint_timeout = 10min
 checkpoint_warning = 8min
 shared_buffers = 32gb
 temp_buffers = 128mb
 work_mem = 512mb
 maintenance_work_mem = 1gb

 Linux sysctls:
 vm.swappiness = 0
 vm.zone_reclaim_mode = 0
 vm.dirty_bytes = 134217728
 vm.dirty_background_bytes = 1048576

Can you provide more details about your setup, including:
* What kind of filesystem are you using?
* Linux distro and/or kernel version
* hardware (RAM, CPU cores etc)
* database usage patterns (% writes, growth, etc)

thanks


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Replication terminated due to PANIC

2013-04-25 Thread Lonni J Friedman

If its really index corruption, then you should be able to fix it by
reindexing.  However, that doesn't explain what caused the corruption.
 Perhaps your hardware is bad in some way?

On Wed, Apr 24, 2013 at 10:46 PM, Adarsh Sharma eddy.ada...@gmail.com wrote:
 Thanks Sergey for such a quick response, but i dont think this is some patch
 problem because we have other DB servers also running fine on same version
 and message is also different :

 host= PANIC: _bt_restore_page: cannot add item to page

 And the whole day replication is working fine but at midnight when log
 rotates it shows belows msg :

 2013-04-24 00:00:00 UTC [26989]: [4945032-1] user= db= host= LOG:
 checkpoint starting: time
 2013-04-24 00:00:00 UTC [26989]: [4945033-1] user= db= host= ERROR:  could
 not open file global/14078: No such file or directory

 2013-04-24 00:00:00 UTC [26989]: [4945034-1] user= db= host= CONTEXT:
 writing block 0 of relation global/14078
 2013-04-24 00:00:00 UTC [26989]: [4945035-1] user= db= host= WARNING:  could
 not write block 0 of global/14078

 2013-04-24 00:00:00 UTC [26989]: [4945036-1] user= db= host= DETAIL:
 Multiple failures --- write error might be permanent.

 Looks like some index corruption.


 Thanks






 On Thu, Apr 25, 2013 at 8:14 AM, Sergey Konoplev gray...@gmail.com wrote:

 On Wed, Apr 24, 2013 at 5:05 PM, Adarsh Sharma eddy.ada...@gmail.com
 wrote:
  I have a Postgresql 9.2 instance running on a CentOS6.3 box.Yesterday i
  setup a hot standby by using pgbasebackup. Today i got the below  alert
  from
  standby box :
 
  [1] (from line 412,723)
  2013-04-24 23:07:18 UTC [13445]: [6-1] user= db= host= PANIC:
  _bt_restore_page: cannot add item to page
 
  When i check, the replication is terminated due to slave DB shutdown.
  From
  the logs i can see below messages :-

 I am not sure that it is your situation but take a look at this thread:


 http://www.postgresql.org/message-id/CAL_0b1t=WuM6roO8dki=w8dhh8p8whhohbpjreymmqurocn...@mail.gmail.com

 There is a patch by Andres Freund in the end of the discussion. Three
 weeks have passed after I installed the patched version and it looks
 like the patch fixed my issue.

 
  2013-04-24 23:17:16 UTC [26989]: [5360083-1] user= db= host= ERROR:
  could
  not open file global/14078: No such file or directory
  2013-04-24 23:17:16 UTC [26989]: [5360084-1] user= db= host= CONTEXT:
  writing block 0 of relation global/14078
  2013-04-24 23:17:16 UTC [26989]: [5360085-1] user= db= host= WARNING:
  could
  not write block 0 of global/14078
  2013-04-24 23:17:16 UTC [26989]: [5360086-1] user= db= host= DETAIL:
  Multiple failures --- write error might be permanent.
 
  I checked in global directory of master, the directory 14078 doesn't
  exist.
 
  Anyone has faced above issue ?
 
  Thanks



 --
 Kind regards,
 Sergey Konoplev
 Database and Software Consultant

 Profile: http://www.linkedin.com/in/grayhemp
 Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
 Skype: gray-hemp
 Jabber: gray...@gmail.com





-- 
~
L. Friedmannetll...@gmail.com
LlamaLand   https://netllama.linux-sxs.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] corrupted item pointer in streaming based replication

2013-04-03 Thread Lonni J Friedman

You should figure out what base/16384/114846.39 corresponds to inside
the database.  If you're super lucky its something unimportant and/or
something that can be recreated easily (like an index).  If its
something important, then you're only option is to try to drop the
object and restore it from the last known good backup.

On Wed, Apr 3, 2013 at 1:02 PM, Jigar Shah js...@pandora.com wrote:
 Hi,

 Postgres version = 9.1.2
 OS = debian(6.0.7)
 fsync = on
 full_page_writes = on
 Setup = Primary and streaming replication based secondary

 Few days ago we had a situation where our Primary started to through the
 error messages below indicating corruption in the database. It crashed
 sometimes and showed a panic message in the logs

 2013-03-25 07:30:39.545 PDT PANIC:  corrupted item pointer: offset = 0, size
 = 0
 2013-03-25 07:30:39.704 PDT LOG:  server process (PID 8715) was terminated
 by signal 6: Aborted
 2013-03-25 07:30:39.704 PDT LOG:  terminating any other active server
 processes

 Days before it started to crash it showed the below error messages in the
 logs.

 [d: u:postgres p:2498 7] ERROR: could not access status of transaction
 837550133
 DETAIL: Could not open file pg_clog/031E: No such file or directory.
 [u:postgres p:2498 9]

 [d: u:radio p:31917 242] ERROR: could not open file base/16384/114846.39
 (target block 360448000): No such file or directory [d: u:radio p:31917 243]

 On top of that, our secondaries are now crashed and would not startup and
 showed the error messages below in pg logs.

 2013-03-27 11:00:47.281 PDT LOG:  recovery restart point at 161A/17108AA8
 2013-03-27 11:00:47.281 PDT DETAIL:  last completed transaction was at log
 time 2013-03-27 11:00:47.241236-07
 2013-03-27 11:00:47.520 PDT LOG:  restartpoint starting: xlog

 2013-03-27 11:07:51.348 PDT FATAL:  corrupted item pointer: offset = 0, size
 = 0
 2013-03-27 11:07:51.348 PDT CONTEXT:  xlog redo split_l: rel
 1663/16384/115085 left 4256959, right 5861610, next 5044459, level 0,
 firstright 192
 2013-03-27 11:07:51.716 PDT LOG:  startup process (PID 5959) exited with
 exit code 1
 2013-03-27 11:07:51.716 PDT LOG:  terminating any other active server
 processes

 At this point we have a running but corrupt primary and crashed secondary
 that wont startup.

 I am wondering what are our options at this point. Can we do something to
 fix this? How can we recover from corruption.

 Thanks for help in advance.

 Regards
 Jigar





-- 
~
L. Friedmannetll...@gmail.com
LlamaLand   https://netllama.linux-sxs.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Streaming replication slave crash

2013-03-29 Thread Lonni J Friedman

Looks like you've got some form of coruption:
page 1441792 of relation base/63229/63370 does not exist


The question is whether it was corrupted on the master and then
replicated to the slave, or if it was corrupted on the slave.  I'd
guess that the pg_dump tried to read from that page and barfed.  It
would be interesting to try re-running the pg_dump again to see if
this crash can be replicated.  If so, does it also replicate if you
run pg_dump against the master?  If not, then the corruption is
isolated to the slave, and you might have a hardware problem which is
causing the data to get corrupted.

On Fri, Mar 29, 2013 at 9:19 AM, Quentin Hartman
qhart...@direwolfdigital.com wrote:
 Yesterday morning, one of my streaming replication slaves running 9.2.3
 crashed with the following in the log file:

 2013-03-28 12:49:30 GMT WARNING:  page 1441792 of relation base/63229/63370
 does not exist
 2013-03-28 12:49:30 GMT CONTEXT:  xlog redo delete: index 1663/63229/109956;
 iblk 303, heap 1663/63229/63370;
 2013-03-28 12:49:30 GMT PANIC:  WAL contains references to invalid pages
 2013-03-28 12:49:30 GMT CONTEXT:  xlog redo delete: index 1663/63229/109956;
 iblk 303, heap 1663/63229/63370;
 2013-03-28 12:49:31 GMT LOG:  startup process (PID 22941) was terminated by
 signal 6: Aborted
 2013-03-28 12:49:31 GMT LOG:  terminating any other active server processes
 2013-03-28 12:49:31 GMT WARNING:  terminating connection because of crash of
 another server process
 2013-03-28 12:49:31 GMT DETAIL:  The postmaster has commanded this server
 process to roll back the current transaction and exit, because another
 server process exited abnormally and possibly corrupted shared memory.
 2013-03-28 12:49:31 GMT HINT:  In a moment you should be able to reconnect
 to the database and repeat your command.
 2013-03-28 12:57:44 GMT LOG:  database system was interrupted while in
 recovery at log time 2013-03-28 12:37:42 GMT
 2013-03-28 12:57:44 GMT HINT:  If this has occurred more than once some data
 might be corrupted and you might need to choose an earlier recovery target.
 2013-03-28 12:57:44 GMT LOG:  entering standby mode
 2013-03-28 12:57:44 GMT LOG:  redo starts at 19/2367CE30
 2013-03-28 12:57:44 GMT LOG:  incomplete startup packet
 2013-03-28 12:57:44 GMT LOG:  consistent recovery state reached at
 19/241835B0
 2013-03-28 12:57:44 GMT LOG:  database system is ready to accept read only
 connections
 2013-03-28 12:57:44 GMT LOG:  invalid record length at 19/2419EE38
 2013-03-28 12:57:44 GMT LOG:  streaming replication successfully connected
 to primary

 As you can see I was able to restart it and it picked up and synchronized
 right away, but this crash still concerns me.

 The DB has about 75GB of data in it, and it is almost entirely write
 traffic. It's essentially a log aggregator. I believe it was doing a pg_dump
 backup at the time of the crash. It has hot_standby_feedback on to allow
 that process to complete.

 Any insights into this, or advice on figuring out the root of it would be
 appreciated. So far all the things I've found like this are bugs that should
 be fixed in this version, or the internet equivalent of a shrug.

 Thanks!

 QH



-- 
~
L. Friedmannetll...@gmail.com
LlamaLand   https://netllama.linux-sxs.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] replication behind high lag

2013-03-25 Thread Lonni J Friedman

On Mon, Mar 25, 2013 at 12:37 PM, AI Rumman rumman...@gmail.com wrote:
 Hi,

 I have two 9.2 databases running with hot_standby replication. Today when I
 was checking, I found that replication has not been working since Mar 1st.
 There was a large database restored in master on that day and I believe
 after that the lag went higher.

 SELECT pg_xlog_location_diff(pg_current_xlog_location(), '0/0') AS offset

 431326108320

 SELECT pg_xlog_location_diff(pg_last_xlog_receive_location(), '0/0') AS
 receive,   pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/0')
 AS replay

receive|replay
 --+--
  245987541312 | 245987534032
 (1 row)

 I checked the pg_xlog in both the server. In Slave the last xlog file
 -rw--- 1 postgres postgres 16777216 Mar  1 06:02
 00010039007F

 In Master, the first xlog file is
 -rw--- 1 postgres postgres 16777216 Mar  1 04:45
 00010039005E


 Is there any way I could sync the slave in quick process?

generate a new base backup, and seed the slave with it.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] replication behind high lag

2013-03-25 Thread Lonni J Friedman

On Mon, Mar 25, 2013 at 12:43 PM, AI Rumman rumman...@gmail.com wrote:


 On Mon, Mar 25, 2013 at 3:40 PM, Lonni J Friedman netll...@gmail.com
 wrote:

 On Mon, Mar 25, 2013 at 12:37 PM, AI Rumman rumman...@gmail.com wrote:
  Hi,
 
  I have two 9.2 databases running with hot_standby replication. Today
  when I
  was checking, I found that replication has not been working since Mar
  1st.
  There was a large database restored in master on that day and I believe
  after that the lag went higher.
 
  SELECT pg_xlog_location_diff(pg_current_xlog_location(), '0/0') AS
  offset
 
  431326108320
 
  SELECT pg_xlog_location_diff(pg_last_xlog_receive_location(), '0/0') AS
  receive,   pg_xlog_location_diff(pg_last_xlog_replay_location(),
  '0/0')
  AS replay
 
 receive|replay
  --+--
   245987541312 | 245987534032
  (1 row)
 
  I checked the pg_xlog in both the server. In Slave the last xlog file
  -rw--- 1 postgres postgres 16777216 Mar  1 06:02
  00010039007F
 
  In Master, the first xlog file is
  -rw--- 1 postgres postgres 16777216 Mar  1 04:45
  00010039005E
 
 
  Is there any way I could sync the slave in quick process?

 generate a new base backup, and seed the slave with it.


 OK. I am getting these error in slave:
 LOG:  invalid contrecord length 284 in log file 57, segment 127, offset 0

 What is the actual reason?

Corruption?  What were you doing when you saw the error?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] replication behind high lag

2013-03-25 Thread Lonni J Friedman

On Mon, Mar 25, 2013 at 12:55 PM, AI Rumman rumman...@gmail.com wrote:


 On Mon, Mar 25, 2013 at 3:52 PM, Lonni J Friedman netll...@gmail.com
 wrote:

 On Mon, Mar 25, 2013 at 12:43 PM, AI Rumman rumman...@gmail.com wrote:
 
 
  On Mon, Mar 25, 2013 at 3:40 PM, Lonni J Friedman netll...@gmail.com
  wrote:
 
  On Mon, Mar 25, 2013 at 12:37 PM, AI Rumman rumman...@gmail.com
  wrote:
   Hi,
  
   I have two 9.2 databases running with hot_standby replication. Today
   when I
   was checking, I found that replication has not been working since Mar
   1st.
   There was a large database restored in master on that day and I
   believe
   after that the lag went higher.
  
   SELECT pg_xlog_location_diff(pg_current_xlog_location(), '0/0') AS
   offset
  
   431326108320
  
   SELECT pg_xlog_location_diff(pg_last_xlog_receive_location(), '0/0')
   AS
   receive,   pg_xlog_location_diff(pg_last_xlog_replay_location(),
   '0/0')
   AS replay
  
  receive|replay
   --+--
245987541312 | 245987534032
   (1 row)
  
   I checked the pg_xlog in both the server. In Slave the last xlog file
   -rw--- 1 postgres postgres 16777216 Mar  1 06:02
   00010039007F
  
   In Master, the first xlog file is
   -rw--- 1 postgres postgres 16777216 Mar  1 04:45
   00010039005E
  
  
   Is there any way I could sync the slave in quick process?
 
  generate a new base backup, and seed the slave with it.
 
 
  OK. I am getting these error in slave:
  LOG:  invalid contrecord length 284 in log file 57, segment 127, offset
  0
 
  What is the actual reason?

 Corruption?  What were you doing when you saw the error?


 I did not have enough idea about these stuffs. I got the database now and
 saw the error.
 Is there any way to recover from this state. The master database is a large
 database of 500 GB.

generate a new base backup, and seed the slave with it.  if the error
persists, then i'd guess that your master is corrupted, and then
you've got huge problems.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] replication behind high lag

2013-03-25 Thread Lonni J Friedman

On Mon, Mar 25, 2013 at 1:23 PM, AI Rumman rumman...@gmail.com wrote:


 On Mon, Mar 25, 2013 at 4:03 PM, AI Rumman rumman...@gmail.com wrote:



 On Mon, Mar 25, 2013 at 4:00 PM, Lonni J Friedman netll...@gmail.com
 wrote:

 On Mon, Mar 25, 2013 at 12:55 PM, AI Rumman rumman...@gmail.com wrote:
 
 
  On Mon, Mar 25, 2013 at 3:52 PM, Lonni J Friedman netll...@gmail.com
  wrote:
 
  On Mon, Mar 25, 2013 at 12:43 PM, AI Rumman rumman...@gmail.com
  wrote:
  
  
   On Mon, Mar 25, 2013 at 3:40 PM, Lonni J Friedman
   netll...@gmail.com
   wrote:
  
   On Mon, Mar 25, 2013 at 12:37 PM, AI Rumman rumman...@gmail.com
   wrote:
Hi,
   
I have two 9.2 databases running with hot_standby replication.
Today
when I
was checking, I found that replication has not been working since
Mar
1st.
There was a large database restored in master on that day and I
believe
after that the lag went higher.
   
SELECT pg_xlog_location_diff(pg_current_xlog_location(), '0/0')
AS
offset
   
431326108320
   
SELECT pg_xlog_location_diff(pg_last_xlog_receive_location(),
'0/0')
AS
receive,
pg_xlog_location_diff(pg_last_xlog_replay_location(),
'0/0')
AS replay
   
   receive|replay
--+--
 245987541312 | 245987534032
(1 row)
   
I checked the pg_xlog in both the server. In Slave the last xlog
file
-rw--- 1 postgres postgres 16777216 Mar  1 06:02
00010039007F
   
In Master, the first xlog file is
-rw--- 1 postgres postgres 16777216 Mar  1 04:45
00010039005E
   
   
Is there any way I could sync the slave in quick process?
  
   generate a new base backup, and seed the slave with it.
  
  
   OK. I am getting these error in slave:
   LOG:  invalid contrecord length 284 in log file 57, segment 127,
   offset
   0
  
   What is the actual reason?
 
  Corruption?  What were you doing when you saw the error?
 
 
  I did not have enough idea about these stuffs. I got the database now
  and
  saw the error.
  Is there any way to recover from this state. The master database is a
  large
  database of 500 GB.

 generate a new base backup, and seed the slave with it.  if the error
 persists, then i'd guess that your master is corrupted, and then
 you've got huge problems.


 Master is running fine right now showing only a warning:
 WARNING:  archive_mode enabled, yet archive_command is not set

 Do you think the master could be corrupted?


 Hi,

 I got the info that there was a master db restart on Feb 27th. Could this be
 a reason of this error?


restarting the database cleanly should never cause corruption.  again,
you need to create a new base backup, and seed the slave with it.  if
the problem persists, then the master is likely corrupted.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] UNLOGGED TEMPORARY tables?

2013-03-25 Thread Lonni J Friedman

I'm pretty sure that unlogged tables and temp tables are two separate
 distinct features, with no overlap in functionality.  It would be
nice if it was possible to create an unlogged temp table.

On Sun, Mar 24, 2013 at 1:32 PM, aasat satri...@veranet.pl wrote:
 I was tested write speed to temporary and unlogged tables and noticed that
 unlogged tables was a much faster

 Postgres 9.2.2

 Write speed

 Temporary 14.5k/s
 UNLOGGED 50k/s

 Before test I was convinced that temporary tables in postgres = 9.1 are
 unlogged






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] UNLOGGED TEMPORARY tables?

2013-03-25 Thread Lonni J Friedman

On Mon, Mar 25, 2013 at 4:49 PM, Michael Paquier
michael.paqu...@gmail.com wrote:


 On Tue, Mar 26, 2013 at 8:26 AM, Lonni J Friedman netll...@gmail.com
 wrote:

 I'm pretty sure that unlogged tables and temp tables are two separate
  distinct features, with no overlap in functionality.  It would be
 nice if it was possible to create an unlogged temp table.

 Temporary tables are a subtype of unlogged tables, as temporary tables are
 not WAL-logged.
 This article from Robert Haas will give a good summary of such differences:
 http://rhaas.blogspot.jp/2010/05/global-temporary-and-unlogged-tables.html


Thanks, that's good to know.  the official dox don't really make it
clear that temp tables are unlogged.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Replication stopped on 9.0.2 after making change to conf file

2013-03-09 Thread Lonni J Friedman

It sounds like all you did was setup the slave from scratch with a
fresh base backup, without understanding or debugging what caused
everything to break.  Clearly whatever was wrong on March 5 is still
wrong, and nothing has been fixed.  The first step in debugging this
problem is to look at and/or post the log content (from both the
master  slave) from the time when this stopped working (march 5).

On Fri, Mar 8, 2013 at 11:06 PM, akp geek akpg...@gmail.com wrote:
 What I noticed is...

 our streaming stopped on march 5th.   I restarted the process today.
 However the new data is syncing correctly. But the data for these  dates
 between 5th and today is not syncing.  Is there some thing wrong that I did.

 here is what I did.

 1.. created base backup
 2.  Took the data directory on to the slave
 3.  stopped the base backup on master
 4. started the db on slave

 Appreciate your help.

 Regards



 On Sat, Mar 9, 2013 at 12:52 AM, akp geek akpg...@gmail.com wrote:

 Thanks a lot. I started the replication. It became very slow. It is taking
 long time to sync the masters data onto slave. Is there a way to find what's
 causing the issue?

 Regards



 On Fri, Mar 8, 2013 at 12:06 PM, John Laing john.la...@gmail.com wrote:

 I'm not sure about the existence of any standard scripts, but we have a
 pair of checks running periodically on the backup server.

 This shouldn't return anything:
 tail -3 /var/log/postgresql/postgresql-9.1-main.log | grep FATAL

 And this should return something:
 ps -u postgres -o cmd | grep postgres: wal receiver process   streaming

 These have worked very reliably for many months.

 -John


 On Fri, Mar 8, 2013 at 11:53 AM, akp geek akpg...@gmail.com wrote:

 I got it fixed.

 What I did was

 $ psql -c SELECT pg_start_backup('label', true)
 $ rsync -a ${PGDATA}/ standby:/srv/pgsql/standby/ --exclude
 postmaster.pid
 $ psql -c SELECT pg_stop_backup()

 It took a while a to catch up the data.


 One question I have , are there any scripts to monitor the status of the
 replciation. so that I can be little proactive


 Regards



 On Thu, Mar 7, 2013 at 9:25 PM, Scott Marlowe scott.marl...@gmail.com
 wrote:

 On Thu, Mar 7, 2013 at 5:28 PM, akp geek akpg...@gmail.com wrote:
  Hi all -
 
 Recently made change on our primary database
 
default_text_search_config = 'pg_catalog.simple' .
  After that
  the replication is stopped. Can you please help me ? how to fix the
  issue. I
  am sure I made the change on the slave also.
 
  How can I start the replication and catch up the data. Thanks for
  your time.

 What are you seeing in your slony and / or postgresql logs, if
 anything?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgres 9.0.2 replicated database is crashing

2013-03-09 Thread Lonni J Friedman

That process merely sets up a new server, it doesn't start streaming,
unless the server has been configured correctly.  You state that the
slave crashed after two hours.  How did you make this determination?

All you seem to be doing is setting up the slave from scratch
repeatedly, and assuming that it will magically just work, rather than
understanding  debugging why its not working.  Where is the log
output from your servers that shows what is really transpiring?

On Sat, Mar 9, 2013 at 6:51 AM, akp geek akpg...@gmail.com wrote:
 Hi all -

  I am in desperate need of your help. The replication/streaming
 stopped working on March5th. I followed the following procedure to restart
 the streaming. After running it for couple of hours , the database is
 crashing on the slave.  This is on our production server.  Thanks for your
 help.

 Steps that I followed

 1.  SELECT pg_start_backup('label', true);
 2.  I created tar of data.
 3.  un tar the data on the slave.
 4. deleted the pid file on the slave
 5. created recovery.conf
 5. SELECT pg_stop_backup();  on the primary
 6.  started the postgres on slave
 7.  After 2 hours,  the slave server crashes

 Please help.

 Regards



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgres 9.0.2 replicated database is crashing

2013-03-09 Thread Lonni J Friedman

On Sat, Mar 9, 2013 at 1:51 PM, akp geek akpg...@gmail.com wrote:
 thank you. As you mentioned, I understood that I am starting the streaming
 scratch which is not what I wanted to do.

 Here is what I am planning to .

 Our replication process was down since March5th.

 1. Is it Ok to get all wals from March5th till now to standby pg_xlog
 2. take pg_basebackup
 3. export the data dir
 4. stop backup
 5. restart the standby.

 Based on my understanding it should work, because when standby started, it
 is not able find the files, as I have deleted from the slave them when I
 tried to fix the issue

Clearly it didn't work because you've been having issues for the past
4 days, and you've already tried this approach unsuccessfully.  Based
on the log snippets below, it looks like you have multiple problems.
First your slave is definitely missing WAL segments.  You should
increase the number of WAL segments that are archived on the master
and ensure that your base backup is including the WAL segments (by
generating it with the -x option).  However, that's the least of your
problems at the moment, because it looks like the master is what is
crashing.



   0  2013-03-09 04:56:08 GMT LOG:  entering standby mode
 cp: cannot access /backup/9.0.4/archive/0001010600E1
   0  2013-03-09 04:56:08 GMT LOG:  redo starts at 106/E120
 cp: cannot access /backup/9.0.4/archive/0001010600E2
 cp: cannot access /backup/9.0.4/archive/0001010600E3
 cp: cannot access /backup/9.0.4/archive/0001010600E4
   0  2013-03-09 04:56:10 GMT LOG:  consistent recovery state reached at
 106/E45AD4A8
   0  2013-03-09 04:56:10 GMT LOG:  database system is ready to accept
 read only connections
 cp: cannot access /backup/9.0.4/archive/0001010600E5
 cp: cannot access /backup/9.0.4/archive/0001010600E6
 cp: cannot access /backup/9.0.4/archive/0001010600E7
 cp: cannot access /backup/9.0.4/archive/0001010600E8
 cp: cannot access /backup/9.0.4/archive/0001010600E9
 cp: cannot access /backup/9.0.4/archive/0001010600EA
   0  2013-03-09 04:56:12 GMT LOG:  invalid record length at 106/EA10B8C0
 cp: cannot access /backup/9.0.4/archive/0001010600EA
   0  2013-03-09 04:56:12 GMT LOG:  streaming replication successfully
 connected to primary
 [unknown]  0  2013-03-09 04:57:00 GMT [unknown]LOG:  connection
 received: host=[local]
 [unknown]  0  2013-03-09 04:57:03 GMT [unknown]LOG:  connection
 received: host=[local]

 [unknown] 10.155.253.43(51257) 0 SELECT 2013-03-09 07:07:18 GMT prodLog:
 duration: 6316.649 ms
 [unknown] 10.155.253.43(51257) 0 idle 2013-03-09 07:47:53 GMT prodLog:
 disconnection: session time: 0:41:06.529 user=postgres database=fprod host=
 10.155.253.43 port=51257
   0  2013-03-09 07:55:48 GMT LOG:  restartpoint starting: time
   0  2013-03-09 08:25:47 GMT LOG:  restartpoint complete: wrote 19419
 buffers (7.4%); write=1799.792 s, sync=0.066 s, total=1799.867 s
   0  2013-03-09 08:25:47 GMT LOG:  recovery restart point at
 107/FB01B238
   0  2013-03-09 08:25:47 GMT DETAIL:  last completed transaction was at
 log time 2013-03-09 08:25:41.85776+00
   0  2013-03-09 08:55:48 GMT LOG:  restartpoint starting: time
 psql [local] 08P01 idle 2013-03-09 09:10:52 GMT prodLog:  unexpected EOF on
 client connection
 psql [local] 0 idle 2013-03-09 09:10:52 GMT prodLog:  disconnection:
 session time: 2:15:06.351 user=postgres database=fprod host=[local]
   XX000  2013-03-09 09:23:46 GMT FATAL:  failed to add item to index page in
 100924/100937/1225845
   XX000  2013-03-09 09:23:46 GMT CONTEXT:  xlog redo Insert item, node:
 100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata: F isleaf F
 isdelete
 F updateBlkno:36483
   0  2013-03-09 09:23:46 GMT LOG:  startup process (PID 3880) exited
 with exit code 1
   0  2013-03-09 09:23:46 GMT LOG:  terminating any other active server
 processes

The real problem is here:
  XX000  2013-03-09 09:23:46 GMT FATAL:  failed to add item to index
page in 100924/100937/1225845
  XX000  2013-03-09 09:23:46 GMT CONTEXT:  xlog redo Insert item,
node: 100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata:
F isleaf F isdelete F updateBlkno:36483

That looks like something is badly broken (maybe data corruption)?
Most of the google hits on that error are associated with GIST
indexes.  Are you using GIST indexes?

Are you really (still) using 9.0.4 ?  I hope you realize that there
are known data corruption bugs in that version, and that version is
super old at this point.  You really need to update to 9.0.12.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgres 9.0.2 replicated database is crashing

2013-03-09 Thread Lonni J Friedman

On Sat, Mar 9, 2013 at 4:05 PM, akp geek akpg...@gmail.com wrote:
 Appreciate your findings.  taking your points and doing things now.  We can
 upgrade. Which version is more stable 9.2 or 9.1

They're two entirely different branches.  They should both be equally
stable.  However, if you're looking for the most straightforward path
I'd recommend going to 9.0.12.  Also be sure to read the release notes
first.


 We use GIST indexes quite a bit. and we gis also

 I recently compiled postgres 9.2 ..

 Regards



 On Sat, Mar 9, 2013 at 5:09 PM, Lonni J Friedman netll...@gmail.com wrote:

 On Sat, Mar 9, 2013 at 1:51 PM, akp geek akpg...@gmail.com wrote:
  thank you. As you mentioned, I understood that I am starting the
  streaming
  scratch which is not what I wanted to do.
 
  Here is what I am planning to .
 
  Our replication process was down since March5th.
 
  1. Is it Ok to get all wals from March5th till now to standby pg_xlog
  2. take pg_basebackup
  3. export the data dir
  4. stop backup
  5. restart the standby.
 
  Based on my understanding it should work, because when standby started,
  it
  is not able find the files, as I have deleted from the slave them when I
  tried to fix the issue

 Clearly it didn't work because you've been having issues for the past
 4 days, and you've already tried this approach unsuccessfully.  Based
 on the log snippets below, it looks like you have multiple problems.
 First your slave is definitely missing WAL segments.  You should
 increase the number of WAL segments that are archived on the master
 and ensure that your base backup is including the WAL segments (by
 generating it with the -x option).  However, that's the least of your
 problems at the moment, because it looks like the master is what is
 crashing.

 
 
0  2013-03-09 04:56:08 GMT LOG:  entering standby mode
  cp: cannot access /backup/9.0.4/archive/0001010600E1
0  2013-03-09 04:56:08 GMT LOG:  redo starts at 106/E120
  cp: cannot access /backup/9.0.4/archive/0001010600E2
  cp: cannot access /backup/9.0.4/archive/0001010600E3
  cp: cannot access /backup/9.0.4/archive/0001010600E4
0  2013-03-09 04:56:10 GMT LOG:  consistent recovery state reached
  at
  106/E45AD4A8
0  2013-03-09 04:56:10 GMT LOG:  database system is ready to
  accept
  read only connections
  cp: cannot access /backup/9.0.4/archive/0001010600E5
  cp: cannot access /backup/9.0.4/archive/0001010600E6
  cp: cannot access /backup/9.0.4/archive/0001010600E7
  cp: cannot access /backup/9.0.4/archive/0001010600E8
  cp: cannot access /backup/9.0.4/archive/0001010600E9
  cp: cannot access /backup/9.0.4/archive/0001010600EA
0  2013-03-09 04:56:12 GMT LOG:  invalid record length at
  106/EA10B8C0
  cp: cannot access /backup/9.0.4/archive/0001010600EA
0  2013-03-09 04:56:12 GMT LOG:  streaming replication
  successfully
  connected to primary
  [unknown]  0  2013-03-09 04:57:00 GMT [unknown]LOG:  connection
  received: host=[local]
  [unknown]  0  2013-03-09 04:57:03 GMT [unknown]LOG:  connection
  received: host=[local]
 
  [unknown] 10.155.253.43(51257) 0 SELECT 2013-03-09 07:07:18 GMT
  prodLog:
  duration: 6316.649 ms
  [unknown] 10.155.253.43(51257) 0 idle 2013-03-09 07:47:53 GMT
  prodLog:
  disconnection: session time: 0:41:06.529 user=postgres database=fprod
  host=
  10.155.253.43 port=51257
0  2013-03-09 07:55:48 GMT LOG:  restartpoint starting: time
0  2013-03-09 08:25:47 GMT LOG:  restartpoint complete: wrote
  19419
  buffers (7.4%); write=1799.792 s, sync=0.066 s, total=1799.867 s
0  2013-03-09 08:25:47 GMT LOG:  recovery restart point at
  107/FB01B238
0  2013-03-09 08:25:47 GMT DETAIL:  last completed transaction was
  at
  log time 2013-03-09 08:25:41.85776+00
0  2013-03-09 08:55:48 GMT LOG:  restartpoint starting: time
  psql [local] 08P01 idle 2013-03-09 09:10:52 GMT prodLog:  unexpected EOF
  on
  client connection
  psql [local] 0 idle 2013-03-09 09:10:52 GMT prodLog:  disconnection:
  session time: 2:15:06.351 user=postgres database=fprod host=[local]
XX000  2013-03-09 09:23:46 GMT FATAL:  failed to add item to index
  page in
  100924/100937/1225845
XX000  2013-03-09 09:23:46 GMT CONTEXT:  xlog redo Insert item, node:
  100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata: F isleaf
  F
  isdelete
  F updateBlkno:36483
0  2013-03-09 09:23:46 GMT LOG:  startup process (PID 3880) exited
  with exit code 1
0  2013-03-09 09:23:46 GMT LOG:  terminating any other active
  server
  processes

 The real problem is here:
   XX000  2013-03-09 09:23:46 GMT FATAL:  failed to add item to index
 page in 100924/100937/1225845
   XX000  2013-03-09 09:23:46 GMT CONTEXT:  xlog redo Insert item,
 node: 100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata:
 F

Re: [GENERAL] broke postgres, how to fix??

2013-02-26 Thread Lonni J Friedman

Did you shut down the 'old' postgres before copying these files?
Did you (re)configure the 'new' postgres to set its $PGDATA directory
to the location of the 'new' files?

On Fri, Feb 22, 2013 at 3:46 PM, JD Wong jdmsw...@gmail.com wrote:
 I tried copying postgres over to a new directory.  it was working until I
 deleted a bunch of databases from the old postgres.  Lo and behold this
 somehow broke the new copy too.

 Now when I start with pg_ctl
 2013-02-22 18:36:13 EST DETAIL:  The database subdirectory base/1066060 is
 missing.
 2013-02-22 18:36:25 EST FATAL:  database wormmine does not exist
 2013-02-22 18:36:25 EST DETAIL:  The database subdirectory base/1027296 is
 missing.
 2013-02-22 18:37:13 EST FATAL:  database wormmine-dev does not exist

 and it won't start...

 How can I fix this?  re-creating these databases with the old postgres
 didn't work...

 Also, why did this happen?  I created a new data directory for the new
 postgres, this should be impossible

 I have no ideas left, can anyone help?

 Thanks in advance,


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] broke postgres, how to fix??

2013-02-26 Thread Lonni J Friedman

On Tue, Feb 26, 2013 at 4:02 PM, JD Wong jdmsw...@gmail.com wrote:
 Hi Adrian, yes I completely copied the config-file and data directories
 over.

 Lonnie, I don't remember.  I might not have shut down the old postgres,
 yes I set PGDATA accordingly.

That's guaranteed to break everything badly.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] broke postgres, how to fix??

2013-02-26 Thread Lonni J Friedman

On Tue, Feb 26, 2013 at 4:10 PM, JD Wong jdmsw...@gmail.com wrote:
 Hi Adrian,

 That's guaranteed to break everything badly.


 Even if I read only style copied the files? Do you mind elaborating on why
 this happens?  ( or point me to relevant documentation )

What is read only style, and how does postgres know about this?

http://www.postgresql.org/docs/9.2/static/backup-file.html



 Thanks,
 -JD

 On Tue, Feb 26, 2013 at 7:04 PM, Lonni J Friedman netll...@gmail.com
 wrote:

 On Tue, Feb 26, 2013 at 4:02 PM, JD Wong jdmsw...@gmail.com wrote:
  Hi Adrian, yes I completely copied the config-file and data directories
  over.
 
  Lonnie, I don't remember.  I might not have shut down the old
  postgres,
  yes I set PGDATA accordingly.





 That's guaranteed to break everything badly.





-- 
~
L. Friedmannetll...@gmail.com
LlamaLand   https://netllama.linux-sxs.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] special procedure required when running pg_basebackup from a standby?

2013-01-13 Thread Lonni J Friedman

Greetings,
I'm running postgres-9.2.2 in a Linux-x86_64 cluster with 1 master and
several hot standby servers.  Since upgrading to 9.2.2 from 9.1.x a
few months ago, I switched from generating a base backup on the
master, to generating it on a dedicated slave/standby (to reduce the
load on the master).  The command that I've always used to generate
the base backup is:
pg_basebackup -v -D /tmp/bb0 -x -Ft -U postgres

However, I've noticed that whenever I use the base backup generated
from the standby to create a new standby server, many of the indexes
are corrupted.  This was never the case when I was generating the
basebackup directly from the master.  Now, I see errors similar to the
following when running queries against the tables that own the
indexes:
INDEX debugger_2013_01_dacode_idx contains unexpected zero page at block 12
HINT:  Please REINDEX it.
INDEX smoke32on64tests_2013_01_suiteid_idx contains unexpected zero
page at block 111
HINT:  Please REINDEX it.

I've confirmed that the errors/corruption doesn't exist on the server
that is generating the base backup (I can run the same SQL query which
fails on the new standby, successfully).  Also reindexing the index
does fix the problem.  So it seems that I'm potentially
misunderstanding some part of the process.  My setup process is to
simply untar the basebackup in the $PGDATA directory, and copy over
all the WAL logs into $PGDATA/pg_xlog.

thanks for any pointers.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] data corruption when using base backups generated from hot standby

2013-01-10 Thread Lonni J Friedman

Greetings,
I'm running postgres-9.2.2 in a Linux-x86_64 cluster with 1 master and
several hot standby servers.  Since upgrading to 9.2.2 from 9.1.x a
few months ago, I switched from generating a base backup on the
master, to generating it on a dedicated slave/standby (to reduce the
load on the master).  The command that I've always used to generate
the base backup is:
pg_basebackup -v -D /tmp/bb0 -x -Ft -U postgres

However, I've noticed that whenever I use the base backup generated
from the standby to create a new standby server, many of the indexes
are corrupted.  This was never the case when I was generating the
basebackup directly from the master.  Now, I see errors similar to the
following when running queries against the tables that own the
indexes:
INDEX debugger_2013_01_dacode_idx contains unexpected zero page at block 12
HINT:  Please REINDEX it.
INDEX smoke32on64tests_2013_01_suiteid_idx contains unexpected zero
page at block 111
HINT:  Please REINDEX it.

I've confirmed that the errors/corruption doesn't exist on the server
that is generating the base backup (I can run the same SQL query which
fails on the new standby, successfully).  Also reindexing the index
does fix the problem.  So it seems that I'm potentially
misunderstanding some part of the process.  My setup process is to
simply untar the basebackup in the $PGDATA directory, and copy over
all the WAL logs into $PGDATA/pg_xlog.

thanks for any pointers.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pgpool2 load balancing not working

2013-01-04 Thread Lonni J Friedman

On Fri, Jan 4, 2013 at 3:42 PM, Greg Donald gdon...@gmail.com wrote:
 Sorry if this is the wrong list, but I've been stuck for a couple days
 now.  I tried pgpool-general but that list appears to not like me.
 I'm not getting any posts and my post hasn't shown up in the archives.

Specifically which address are you sending to?  I'm on the official
list, and it seems to work fine for me.


 I have a Python/Django app that will require database load balancing
 at some point in the near future.  In the meantime I'm trying to learn
 to implement pgpool on a local virtual machine setup.

 I have 4 Ubuntu 12.04 VMs:

 192.168.1.80 - pool, pgppool2 installed and accessible
 192.168.1.81 - db1 master
 192.168.1.82 - db2 slave
 192.168.1.83 - db3 slave

 I have pgpool-II version 3.1.1 and my database servers are running
 PostgreSQL 9.1.

 I have my app's db connection pointed to 192.168.1.80: and it works fine.

 The problem is when I use Apache ab to throw some load at it, none of
 SELECT queries appear to be balanced.  All the load goes to my db1
 master.  Also, very concerning is the load on the pool server itself,
 it is really high compared to db1, maybe an average of 8-10 times
 higher.  Meanwhile my db2 and db3 servers have a load of nearly zero,
 they appear to only be replicating from db1, which isn't very load
 intensive for my tests with ab.

 ab -n 300 -c 4 -C 'sessionid=80a5fd3b6bb59051515e734326735f80'
 http://192.168.1.17/contacts/

 That drives the load on my pool server up to about 2.3.  Load on db1
 is about 0.4 and load on db2 and db3 is nearly zero.

 Can someone take a look at my pgpool.conf and see if what I'm doing wrong?

 http://pastebin.com/raw.php?i=wzBc0aSp

Nothing is jumping out at me as blatantly wrong, although it seems
kinda weird that each of your database servers is listening on a
different port #.



 I'm starting to think maybe it has something to do with Django
 wrapping every request in a transaction by default, but when the
 transaction only has SELECTs, shouldn't that be load balanced just
 fine?  Makes my stomach hurt to think I may have to turn off
 auto-commit and manually commit transactions all throughout my code :(
  Still hoping it's a pgpool setup issue, since it's my first time
 setting it up and all.

I've never done anything with Django, but this seems like a good
possibility that the transactions are causing pgpool to get confused
and assume that every query requires write access.

What might be more useful is for you to post your actual pgpool log
somewhere, as that might contain a clue of what is going wrong.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgresql logfilename and times in GMT - not EST

2012-12-04 Thread Lonni J Friedman

On Tue, Dec 4, 2012 at 1:59 PM, Bryan Montgomery mo...@english.net wrote:
 We have a test 9.2.0 db running on openSuse 12.2. When I select now() I get
 the correct timezone and date back (-5 hours).
 When I do date at the os prompt, I get the right timezone back.

 I changed postgres.conf to have timezone = 'EST' and restarted postgres.
 However the log file is still 5 hours ahead. What gives? Not the end of the
 world but a bit annoying.

you need to set log_timezone .  This is a new 'feature' in 9.2 that
annoyed me as well.  I assume that there was a good use case for this.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgresql logfilename and times in GMT - not EST

2012-12-04 Thread Lonni J Friedman

On Tue, Dec 4, 2012 at 2:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Lonni J Friedman netll...@gmail.com writes:
 On Tue, Dec 4, 2012 at 1:59 PM, Bryan Montgomery mo...@english.net wrote:
 I changed postgres.conf to have timezone = 'EST' and restarted postgres.
 However the log file is still 5 hours ahead. What gives? Not the end of the
 world but a bit annoying.

 you need to set log_timezone .  This is a new 'feature' in 9.2 that
 annoyed me as well.  I assume that there was a good use case for this.

 New?  log_timezone has been around since 8.3, and it seems like a good
 idea to me --- what if you have N sessions each with its own active
 timezone setting?  Timestamps in the log would be an unreadable mismash
 if there weren't a separate log_timezone setting.

 What did change in 9.2 is that initdb sets values for timezone and
 log_timezone in postgresql.conf, so it's the initdb environment that
 will determine what you get in the absence of any manual action.
 Before that it was the postmaster's environment.

Sorry, I meant new, in that its impact changed in 9.2 such that it
needed to be explicitly set to not get UTC by default, whereas in the
past that wasn't required.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Table with million rows - and PostgreSQL 9.1 is not using the index

2012-12-04 Thread Lonni J Friedman

I'm no expert on this, but it will likely be more helpful to others if
you include the table description with all the indices.

On Tue, Dec 4, 2012 at 8:44 PM, Edson Richter edsonrich...@hotmail.com wrote:
 I've a table with 110 rows, with streets.
 I'm making a partial search using zip code, and PostgreSQL is ignoring my
 ZIP index.
 I'm sure I'm making some mistake, but I can't see where.
 The query is:

 SELECT t2.ID, t2.CEP, t2.COMPLEMENTO, t2.NOME, t2.NOMESEMACENTOS, t2.TIPO,
 t2.BAIRRO_ID
   FROM LOCALIDADE t0, LOGRADOURO t2, BAIRRO t1
  WHERE t2.CEP LIKE '81630160%' AND ((t1.ID = t2.BAIRRO_ID) AND (t0.ID =
 t1.LOCALIDADE_ID)) ORDER BY t0.NOME;

 (for reference, BAIRRO = town, LOCALIDADE = city, LOGRADOURO = street)

 Here is the result of explain analyze:

 Sort  (cost=11938.72..11938.74 rows=91 width=93)
   Sort Key: t0.nome
   -  Nested Loop  (cost=0.00..11938.42 rows=91 width=93)
 -  Nested Loop  (cost=0.00..11935.19 rows=91 width=85)
   -  Seq Scan on logradouro t2  (cost=0.00..11634.42 rows=91
 width=81)
 Filter: ((cep)::text ~~ '81630160%'::text)
   -  Index Scan using pkbairro on bairro t1 (cost=0.00..3.30
 rows=1 width=8)
 Index Cond: (id = t2.bairro_id)
 -  Index Scan using pklocalidade on localidade t0 (cost=0.00..0.03
 rows=1 width=16)
   Index Cond: ((id)::text = (t1.localidade_id)::text)

 I've few tweaks in postgresql.conf:
 shared_buffers = 2GB
 temp_buffers = 32MB
 max_prepared_transactions = 50
 work_mem = 32MB
 maintenance_work_mem = 16MB
 max_stack_depth = 4MB
 max_files_per_process = 15000
 random_page_cost = 2.0
 cpu_tuple_cost = 0.001
 cpu_index_tuple_cost = 0.0005
 cpu_operator_cost = 0.00025
 effective_cache_size = 512MB

 Everything else is default configuration.

 This machine is Intel Quad 3.1Ghz, with 8 threads, 8Gig of RAM, 8Gig of
 Swap, running CentOS 6.3 64bit.
 Machine is free almost all the time.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Quick estimate of num of rows table size

2012-11-05 Thread Lonni J Friedman

On Mon, Nov 5, 2012 at 2:02 PM, Thalis Kalfigkopoulos
tkalf...@gmail.com wrote:
 Hi all,

 I read somewhere that the following query gives a quick estimate of the # of
 rows in a table regardless of the table's size (which would matter in a
 simple SELECT count(*)?):

 SELECT (CASE WHEN reltuples  0 THEN
 pg_relation_size('mytable')/(8192*relpages/reltuples)
 ELSE 0
 END)::bigint AS estimated_row_count
 FROM pg_class
 WHERE oid = 'mytable'::regclass;

 If relpages  reltuples are recorded accurately each time VACUUM is run,
 wouldn't it be the same to just grab directly the value of reltuples like:

 SELECT reltuples FROM pg_class WHERE oid='mytable'::regclass;

 In the same manner, are pg_relation_size('mytable') and 8192*relpages the
 same?

 I run both assumptions against a freshly VACUUMed table and they seem
 correct.

This doesn't seem to work for me.  I get an estimated row_count of 0
on a table that I know has millions of rows.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Quick estimate of num of rows table size

2012-11-05 Thread Lonni J Friedman

On Mon, Nov 5, 2012 at 3:56 PM, Thalis Kalfigkopoulos
tkalf...@gmail.com wrote:

 On Mon, Nov 5, 2012 at 7:14 PM, Lonni J Friedman netll...@gmail.com wrote:

 On Mon, Nov 5, 2012 at 2:02 PM, Thalis Kalfigkopoulos
 tkalf...@gmail.com wrote:
  Hi all,
 
  I read somewhere that the following query gives a quick estimate of the
  # of
  rows in a table regardless of the table's size (which would matter in a
  simple SELECT count(*)?):
 
  SELECT (CASE WHEN reltuples  0 THEN
  pg_relation_size('mytable')/(8192*relpages/reltuples)
  ELSE 0
  END)::bigint AS estimated_row_count
  FROM pg_class
  WHERE oid = 'mytable'::regclass;
 
  If relpages  reltuples are recorded accurately each time VACUUM is run,
  wouldn't it be the same to just grab directly the value of reltuples
  like:
 
  SELECT reltuples FROM pg_class WHERE oid='mytable'::regclass;
 
  In the same manner, are pg_relation_size('mytable') and 8192*relpages
  the
  same?
 
  I run both assumptions against a freshly VACUUMed table and they seem
  correct.

 This doesn't seem to work for me.  I get an estimated row_count of 0
 on a table that I know has millions of rows.


 Which one doesn't work exactly? The larger query? Are you on a 9.x?

doh, sorry.  The first/larger doesn't work.  As it turns out the 2nd
actually does work well.  I'm on 9.1.x.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Hot Standby Not So Hot Anymore

2012-11-05 Thread Lonni J Friedman

On Mon, Nov 5, 2012 at 7:40 PM, Ian Harding harding@gmail.com wrote:
 I had a 9.0.8 hot standby setup, one master, two slaves, working great.
 Then, I tried to re-initialize by making a base backup, the way I've done it
 many times before, but for some reason I can't get the standby to accept
 connections.  I copied the postgresql.conf and recorvery.conf out of the
 way, cleaned the data directory and extracted the backup files, then
 replaced the conf files.  Everything works perfectly, but I keep getting
 :FATAL:  the database system is starting up

 I know I'm an idiot and that I did something wrong but I can't find it.

 [root@db03 data]# grep standby postgresql.conf
 wal_level = hot_standby# minimal, archive, or hot_standby
 hot_standby = on# on allows queries during recovery
 max_standby_archive_delay = -1# max delay before canceling queries
 max_standby_streaming_delay = -1# max delay before canceling queries

 [root@db03 data]# grep standby recovery.conf
 # Specifies whether to start the server as a standby. In streaming
 replication,
 standby_mode  = 'on'
 # Specifies a connection string which is used for the standby server to
 connect

 [root@db03 data]# ps aux | grep stream
 postgres 62127  0.1  0.0 34554356 3724 ?   Ss   16:22   0:16 postgres:
 wal receiver process   streaming BA7/12B87818


 [root@db03 data]# ps aux | grep startup
 postgres 62122  0.5  0.9 34545900 1223708 ?Ss   16:22   1:03 postgres:
 startup process   recovering 00010BA70012

 Any suggestions appreciated!!!

What's your log have to say?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Hot Standby Not So Hot Anymore

2012-11-05 Thread Lonni J Friedman

On Mon, Nov 5, 2012 at 7:49 PM, Ian Harding harding@gmail.com wrote:


 On Mon, Nov 5, 2012 at 7:46 PM, Lonni J Friedman netll...@gmail.com wrote:

 On Mon, Nov 5, 2012 at 7:40 PM, Ian Harding harding@gmail.com wrote:
  I had a 9.0.8 hot standby setup, one master, two slaves, working great.
  Then, I tried to re-initialize by making a base backup, the way I've
  done it
  many times before, but for some reason I can't get the standby to accept
  connections.  I copied the postgresql.conf and recorvery.conf out of the
  way, cleaned the data directory and extracted the backup files, then
  replaced the conf files.  Everything works perfectly, but I keep getting
  :FATAL:  the database system is starting up
 
  I know I'm an idiot and that I did something wrong but I can't find it.
 
  [root@db03 data]# grep standby postgresql.conf
  wal_level = hot_standby# minimal, archive, or hot_standby
  hot_standby = on# on allows queries during recovery
  max_standby_archive_delay = -1# max delay before canceling queries
  max_standby_streaming_delay = -1# max delay before canceling queries
 
  [root@db03 data]# grep standby recovery.conf
  # Specifies whether to start the server as a standby. In streaming
  replication,
  standby_mode  = 'on'
  # Specifies a connection string which is used for the standby server to
  connect
 
  [root@db03 data]# ps aux | grep stream
  postgres 62127  0.1  0.0 34554356 3724 ?   Ss   16:22   0:16
  postgres:
  wal receiver process   streaming BA7/12B87818
 
 
  [root@db03 data]# ps aux | grep startup
  postgres 62122  0.5  0.9 34545900 1223708 ?Ss   16:22   1:03
  postgres:
  startup process   recovering 00010BA70012
 
  Any suggestions appreciated!!!

 What's your log have to say?


 It says everything is happy as normal...

 2012-11-05 16:22:38.744 PST -  :LOG:  database system was shut down in
 recovery at 2012-11-05 16:22:33 PST
 2012-11-05 16:22:38.745 PST -  :LOG:  entering standby mode
 2012-11-05 16:22:38.746 PST -  :LOG:  redo starts at BA5/F96F86A8
 2012-11-05 16:22:38.762 PST - postgres :FATAL:  the database system is
 starting up
 2012-11-05 16:22:39.764 PST - postgres :FATAL:  the database system is
 starting up
 2012-11-05 16:22:40.766 PST - postgres :FATAL:  the database system is
 starting up
 2012-11-05 16:22:41.200 PST -  :LOG:  invalid record length at BA6/6DCBA48
 2012-11-05 16:22:41.206 PST -  :LOG:  streaming replication successfully
 connected to primary


 This is after I stopped and restarted... after that just more failed
 connection attempts.

Are you sure that its not still recovering?  In other words, when you
look at the 'ps' output do the 'wal receiver process' and 'recovering'
entries change?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Hot Standby Not So Hot Anymore

2012-11-05 Thread Lonni J Friedman

On Mon, Nov 5, 2012 at 8:13 PM, Ian Harding harding@gmail.com wrote:



 On Mon, Nov 5, 2012 at 7:57 PM, Lonni J Friedman netll...@gmail.com wrote:

 On Mon, Nov 5, 2012 at 7:49 PM, Ian Harding harding@gmail.com wrote:
 
 
  On Mon, Nov 5, 2012 at 7:46 PM, Lonni J Friedman netll...@gmail.com
  wrote:
 
  On Mon, Nov 5, 2012 at 7:40 PM, Ian Harding harding@gmail.com
  wrote:
   I had a 9.0.8 hot standby setup, one master, two slaves, working
   great.
   Then, I tried to re-initialize by making a base backup, the way I've
   done it
   many times before, but for some reason I can't get the standby to
   accept
   connections.  I copied the postgresql.conf and recorvery.conf out of
   the
   way, cleaned the data directory and extracted the backup files, then
   replaced the conf files.  Everything works perfectly, but I keep
   getting
   :FATAL:  the database system is starting up
  
   I know I'm an idiot and that I did something wrong but I can't find
   it.
  
   [root@db03 data]# grep standby postgresql.conf
   wal_level = hot_standby# minimal, archive, or hot_standby
   hot_standby = on# on allows queries during recovery
   max_standby_archive_delay = -1# max delay before canceling
   queries
   max_standby_streaming_delay = -1# max delay before canceling
   queries
  
   [root@db03 data]# grep standby recovery.conf
   # Specifies whether to start the server as a standby. In streaming
   replication,
   standby_mode  = 'on'
   # Specifies a connection string which is used for the standby server
   to
   connect
  
   [root@db03 data]# ps aux | grep stream
   postgres 62127  0.1  0.0 34554356 3724 ?   Ss   16:22   0:16
   postgres:
   wal receiver process   streaming BA7/12B87818
  
  
   [root@db03 data]# ps aux | grep startup
   postgres 62122  0.5  0.9 34545900 1223708 ?Ss   16:22   1:03
   postgres:
   startup process   recovering 00010BA70012
  
   Any suggestions appreciated!!!
 
  What's your log have to say?
 
 
  It says everything is happy as normal...
 
  2012-11-05 16:22:38.744 PST -  :LOG:  database system was shut down in
  recovery at 2012-11-05 16:22:33 PST
  2012-11-05 16:22:38.745 PST -  :LOG:  entering standby mode
  2012-11-05 16:22:38.746 PST -  :LOG:  redo starts at BA5/F96F86A8
  2012-11-05 16:22:38.762 PST - postgres :FATAL:  the database system is
  starting up
  2012-11-05 16:22:39.764 PST - postgres :FATAL:  the database system is
  starting up
  2012-11-05 16:22:40.766 PST - postgres :FATAL:  the database system is
  starting up
  2012-11-05 16:22:41.200 PST -  :LOG:  invalid record length at
  BA6/6DCBA48
  2012-11-05 16:22:41.206 PST -  :LOG:  streaming replication successfully
  connected to primary
 
 
  This is after I stopped and restarted... after that just more failed
  connection attempts.

 Are you sure that its not still recovering?  In other words, when you
 look at the 'ps' output do the 'wal receiver process' and 'recovering'
 entries change?


 Yeah, it's still recovering.  In normal operation it keeps right on
 recovering, it's just that it used to accept read-only connections...

Is the 2nd slave also experiencing this problem?  If not, is it
further ahead than the bad one?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Hot Standby Not So Hot Anymore

2012-11-05 Thread Lonni J Friedman

On Mon, Nov 5, 2012 at 8:31 PM, Ian Harding harding@gmail.com wrote:



 On Mon, Nov 5, 2012 at 8:15 PM, Lonni J Friedman netll...@gmail.com wrote:

 On Mon, Nov 5, 2012 at 8:13 PM, Ian Harding harding@gmail.com wrote:
 
 
 
  On Mon, Nov 5, 2012 at 7:57 PM, Lonni J Friedman netll...@gmail.com
  wrote:
 
  On Mon, Nov 5, 2012 at 7:49 PM, Ian Harding harding@gmail.com
  wrote:
  
  
   On Mon, Nov 5, 2012 at 7:46 PM, Lonni J Friedman netll...@gmail.com
   wrote:
  
   On Mon, Nov 5, 2012 at 7:40 PM, Ian Harding harding@gmail.com
   wrote:
I had a 9.0.8 hot standby setup, one master, two slaves, working
great.
Then, I tried to re-initialize by making a base backup, the way
I've
done it
many times before, but for some reason I can't get the standby to
accept
connections.  I copied the postgresql.conf and recorvery.conf out
of
the
way, cleaned the data directory and extracted the backup files,
then
replaced the conf files.  Everything works perfectly, but I keep
getting
:FATAL:  the database system is starting up
   
I know I'm an idiot and that I did something wrong but I can't
find
it.
   
[root@db03 data]# grep standby postgresql.conf
wal_level = hot_standby# minimal, archive, or
hot_standby
hot_standby = on# on allows queries during recovery
max_standby_archive_delay = -1# max delay before canceling
queries
max_standby_streaming_delay = -1# max delay before canceling
queries
   
[root@db03 data]# grep standby recovery.conf
# Specifies whether to start the server as a standby. In streaming
replication,
standby_mode  = 'on'
# Specifies a connection string which is used for the standby
server
to
connect
   
[root@db03 data]# ps aux | grep stream
postgres 62127  0.1  0.0 34554356 3724 ?   Ss   16:22   0:16
postgres:
wal receiver process   streaming BA7/12B87818
   
   
[root@db03 data]# ps aux | grep startup
postgres 62122  0.5  0.9 34545900 1223708 ?Ss   16:22   1:03
postgres:
startup process   recovering 00010BA70012
   
Any suggestions appreciated!!!
  
   What's your log have to say?
  
  
   It says everything is happy as normal...
  
   2012-11-05 16:22:38.744 PST -  :LOG:  database system was shut down
   in
   recovery at 2012-11-05 16:22:33 PST
   2012-11-05 16:22:38.745 PST -  :LOG:  entering standby mode
   2012-11-05 16:22:38.746 PST -  :LOG:  redo starts at BA5/F96F86A8
   2012-11-05 16:22:38.762 PST - postgres :FATAL:  the database system
   is
   starting up
   2012-11-05 16:22:39.764 PST - postgres :FATAL:  the database system
   is
   starting up
   2012-11-05 16:22:40.766 PST - postgres :FATAL:  the database system
   is
   starting up
   2012-11-05 16:22:41.200 PST -  :LOG:  invalid record length at
   BA6/6DCBA48
   2012-11-05 16:22:41.206 PST -  :LOG:  streaming replication
   successfully
   connected to primary
  
  
   This is after I stopped and restarted... after that just more failed
   connection attempts.
 
  Are you sure that its not still recovering?  In other words, when you
  look at the 'ps' output do the 'wal receiver process' and 'recovering'
  entries change?
 
 
  Yeah, it's still recovering.  In normal operation it keeps right on
  recovering, it's just that it used to accept read-only connections...

 Is the 2nd slave also experiencing this problem?  If not, is it
 further ahead than the bad one?


 Everything looks fine (forgot to cc the list...)


 [root@db0 ~]# ps aux | grep stream
 postgres 45267  0.0  0.0 34546456 3036 ?   Ss   16:22   0:14 postgres:
 wal sender process postgres 192.168.4.4(55925) streaming BA7/5FC1BFD8
 postgres 54247  0.0  0.0 34546456 3036 ?   Ss   18:15   0:07 postgres:
 wal sender process postgres 192.168.4.3(57482) streaming BA7/5FC1BFD8


Right, but what I meant was, are both slaves experiencing the same
problem, or just 1?  Did you use the same base backup on both of them,
or did you generate a different one for each slave?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] dropdb breaks replication?

2012-10-31 Thread Lonni J Friedman

On Wed, Oct 31, 2012 at 10:32 AM, Edson Richter
edsonrich...@hotmail.com wrote:
 I've two PostgreSQL 9.1.6 running on Linux CentOS 5.8 64bit.
 They are replicated asynchronously.

 Yesterday, I've dropped a database of 20Gb, and then replication has broken,
 requiring me to manually synchronize both servers again.

 It is expected that dropdb (or, perhaps, createdb) break existing
 replication between servers?

How did you determine that replication was broken, and how did you
manually synchronize the servers?  Are you certain that replication
was working prior to dropping the database?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] dropdb breaks replication?

2012-10-31 Thread Lonni J Friedman

On Wed, Oct 31, 2012 at 11:01 AM, Edson Richter
edsonrich...@hotmail.com wrote:
 Em 31/10/2012 15:39, Lonni J Friedman escreveu:

 On Wed, Oct 31, 2012 at 10:32 AM, Edson Richter
 edsonrich...@hotmail.com wrote:

 I've two PostgreSQL 9.1.6 running on Linux CentOS 5.8 64bit.
 They are replicated asynchronously.

 Yesterday, I've dropped a database of 20Gb, and then replication has
 broken,
 requiring me to manually synchronize both servers again.

 It is expected that dropdb (or, perhaps, createdb) break existing
 replication between servers?

 How did you determine that replication was broken, and how did you
 manually synchronize the servers?  Are you certain that replication
 was working prior to dropping the database?


 I'm sure replication was running.
 I usually keep two windows open in both servers, running

 In master:

 watch -n 2 ps aux | egrep sender

 In slave:

 watch -n 2 ps aux | egrep receiver


 At the point the dropdb command has been executed, both disappeared from my
 radar.
 Also, in the log there is the following error:

 LOG:  replicação em fluxo conectou-se com sucesso ao servidor principal
 FATAL:  não pôde receber dados do fluxo do WAL: FATAL:  segmento do WAL
 solicitado 0001000100BE já foi removido


 May the cause not having enough segments (currently 80) for dropdb command?
 Is dropdb logged in transaction log page-by-page excluded?

I can't read portugese(?), but i think the gist of the error is that
the WAL segment was already removed before the slave could consume it.
 I'm guessing that you aren't keeping enough of them, and dropping the
database generated a huge volume which flushed out the old ones before
they could get consumed by your slave.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] 9.1 to 9.2 requires a dump/reload?

2012-10-22 Thread Lonni J Friedman

pg_upgrade has worked fine for several releases.  I believe that the
only time when pg_upgrade isn't a viable option is for some types of
GIST indices.

On Mon, Oct 22, 2012 at 2:55 PM, Nikolas Everett nik9...@gmail.com wrote:
 I was just looking at
 http://www.postgresql.org/docs/devel/static/release-9-2.html and it
 mentioned that a dump/reload cycle was required to upgrade from a previous
 release.  I just got done telling some of my coworkers that PG had been
 bitten by this enough times that they were done with it.  Am I wrong?  Is
 this normal?

 I see that pg_upgrade is an option.  Having never used how long should I
 expect pg_upgrade to take?  Obviously we'll measure it in our environment,
 but it'd be nice to have a ballpark figure.

 Nik



-- 
~
L. Friedmannetll...@gmail.com
LlamaLand   https://netllama.linux-sxs.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Strategies/Best Practises Handling Large Tables

2012-10-12 Thread Lonni J Friedman

On Fri, Oct 12, 2012 at 7:44 AM, Chitra Creta chitracr...@gmail.com wrote:
 Hi,

 I currently have a table that is growing very quickly - i.e 7 million
 records in 5 days. This table acts as a placeholder for statistics, and
 hence the records are merely inserted and never updated or deleted.

 Many queries are run on this table to obtain trend analysis. However, these
 queries are now starting to take a very long time (hours) to execute due to
 the size of the table.

 I have put indexes on this table, to no significant benefit.  Some of the
 other strategies I have thought of:
 1. Purge old data
 2. Reindex
 3. Partition
 4. Creation of daily, monthly, yearly summary tables that contains
 aggregated data specific to the statistics required

 Does anyone know what is the best practice to handle this situation?

 I would appreciate knowledge sharing on the pros and cons of the above, or
 if there are any other strategies that I could put in place.

Partitioning is prolly your best solution.  3  4 sound like
variations on the same thing.  Before you go that route, you should
make sure that your bottleneck is really a result of the massive
amount of data, and not some other problem.  Are you sure that the
indices you created are being used, and that you have all the indices
that you need for your queries?  Look at the query plan output from
EXPLAIN, and/or post here if you're unsure.

Reindexing shouldn't make a difference unless something is wrong with
the indices that you already have in place.

Purging old data is only a good solution if you do not need the data,
and never will need the data.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] strange hot_standby behaviour

2012-10-01 Thread Lonni J Friedman

On Mon, Oct 1, 2012 at 7:28 AM, pfote pf...@ypsilon.net wrote:
 Hi,

 I had a very strange effect on the weekend that smells like a bug, so i'd
 like so share it.

 Setup:
 machine A: 16 CPU Cores (modern), 128GB RAM, nice 6-drive SAS Raid-10
 machines B, C: 8 Cores (substantially older than A), 48GB Ram, some scsi
 Raid, substantially slower than A

 The workload is about 80% - 90% SELECTs with heavy sorting and grouping, the
 remaining are INSERTs/UPDATEs/DELETEs.
 So In the original setup A  is the master, B and C are hot standby's that
 process some of the SELECTs, but by far the most processing is done on the
 master (A). pg version is 9.0.6. CPU utilization is about 80% on the master
 and between 90-100% in the standby's, so it's decided to upgrade to the
 latest 9.2 to profit from the latest performance enhancements.

 So B gets upgraded to 9.2.1-1.pgdg60+1 (from pgapt.debian.org) and becomes
 master, then A becomes a hot_standby slave that takes all the SELECTs (and C
 becomes another hot_standby). In the beginning everything works as expected,
 CPU utilization drops from 80% to about 50-60%, selects run faster,
 everything looks smoother (some queries drop from 5s to 1s due to 9.2s
 index-only-scan feature). Its friday, everyone is happy.

 About 16 hours later, saturday morning around 6:00, A suddenly goes wild and
 has a CPU utilization of 100% without a change in the workload, out of the
 blue. Queries that used to take 1s suddenly take 5-10s, explain analyze
 plans of these queries havn't change a bit though. Switching the workload
 off causes the server to become idle. (while I'm writing this I realize we
 haven't tried to restart A). Instead, $boss decides to twitch back to the
 original setup, so B gets dropped, A becomes master and gets 100% of the
 workload (all SELECTs/INSERTs/UPDATEs/DELETEs), and everything becomes just
 like friday, CPU usage drops to 50-60%, everything runs smothly.

 I'm not sure yet if this is replication related or a 9.2.1 problem. Any
 Ideas?

This could be just about anything.  Which OS are you running?  Did you
check any logs when everything went crazy?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] initial sync of multiple streaming slaves simultaneously

2012-09-19 Thread Lonni J Friedman

On Wed, Sep 19, 2012 at 8:59 AM, Mike Roest mike.ro...@replicon.com wrote:
 Hey Everyone,
 We currently have a 9.1.5 postgres cluster running using streaming
 replication.  We have 3 nodes right now

 2 - local that are setup with pacemaker for a HA master/slave set failover
 cluster
 1 - remote as a DR.

 Currently we're syncing with the pretty standard routine

 clear local datadir
 pg_start_backup
 sync datadir with fast-archiver (https://github.com/replicon/fast-archiver)
 pg_stop_backup
 start slave

 We use the streaming replication with wal_keep_segments set to 1000 to get
 the required WAL files to the slaves.

 With this procedure we can currently only sync one of the slaves at a time
 if we failover.  As when the second machine goes to start the sync it errors
 out cause trying to run pg_start_backup fails.

Specifically what is the error?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] initial sync of multiple streaming slaves simultaneously

2012-09-19 Thread Lonni J Friedman

Just curious, is there a reason why you can't use pg_basebackup ?

On Wed, Sep 19, 2012 at 12:27 PM, Mike Roest mike.ro...@replicon.com wrote:

 Is there any hidden issue with this that we haven't seen.  Or does anyone
 have suggestions as to an alternate procedure that will allow 2 slaves to
 sync concurrently.

 With some more testing I've done today I seem to have found an issue with
 this procedure.
 When the slave starts up after the sync It reaches what it thinks is a
 consistent recovery point very fast based on the pg_stop_backup

 eg:
 (from the recover script)
 2012-09-19 12:15:02: pgsql_start start
 2012-09-19 12:15:31: pg_start_backup
 2012-09-19 12:15:31: -
 2012-09-19 12:15:31: 61/3020
 2012-09-19 12:15:31: (1 row)
 2012-09-19 12:15:31:
 2012-09-19 12:15:32: NOTICE:  pg_stop_backup complete, all required WAL
 segments have been archived
 2012-09-19 12:15:32: pg_stop_backup
 2012-09-19 12:15:32: 
 2012-09-19 12:15:32: 61/30D8
 2012-09-19 12:15:32: (1 row)
 2012-09-19 12:15:32:

 While the sync was running (but after the pg_stop_backup) I pushed a bunch
 of traffic against the master server.  Which got me to a current xlog
 location of
 postgres=# select pg_current_xlog_location();
  pg_current_xlog_location
 --
  61/6834C450
 (1 row)

 The startup of the slave after the sync completed:
 2012-09-19 12:42:49.976 MDT [18791]: [1-1] LOG:  database system was
 interrupted; last known up at 2012-09-19 12:15:31 MDT
 2012-09-19 12:42:49.976 MDT [18791]: [2-1] LOG:  creating missing WAL
 directory pg_xlog/archive_status
 2012-09-19 12:42:50.143 MDT [18791]: [3-1] LOG:  entering standby mode
 2012-09-19 12:42:50.173 MDT [18792]: [1-1] LOG:  streaming replication
 successfully connected to primary
 2012-09-19 12:42:50.487 MDT [18791]: [4-1] LOG:  redo starts at 61/3020
 2012-09-19 12:42:50.495 MDT [18791]: [5-1] LOG:  consistent recovery state
 reached at 61/3100
 2012-09-19 12:42:50.495 MDT [18767]: [2-1] LOG:  database system is ready to
 accept read only connections

 It shows the DB reached a consistent state as of 61/3100 which is well
 behind the current location of the master (and the data files that were
 synced over to the slave).  And monitoring the server showed the expected
 slave delay that disappeared as the slave pulled and recovered from the WAL
 files that go generated after the pg_stop_backup.

 But based on this it looks like this procedure would end up with a
 indeterminate amount of time (based on how much traffic the master processed
 while the slave was syncing) that the slave couldn't be trusted for fail
 over or querying as the server is up and running but is not actually in a
 consistent state.

 Thinking it through the more complicated script version of the 2 server
 recovery (where first past the post to run start_backup or stop_backup)
 would also have this issue (although our failover slave would always be the
 one running stop backup as it syncs faster so at least it would be always
 consistent but the DR would still have the problem)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] File system level backup

2012-07-26 Thread Lonni J Friedman

On Thu, Jul 26, 2012 at 3:39 AM, Manoj Agarwal m...@ockham.be wrote:
 Hi,



 I have two virtual machines with two different versions of Postgresql.  One
 machine contains Postgres 7.4.19 and another has Postgres 8.4.3.  I also
 have other instances of these two virtual machines.  I need to transfer the
 database from one machine to other.  Both machines contain a database with
 the same name, for example: testdb, but with different data/values, but the
 structure is exactly same.



 I want to do the following:

 · Take file system level backup from  from first machine for
 Postgres database ‘testdb’ and restore it in another instance that has SAME
 version of Postgres.  i.e. backup the database ‘testdb’ from Postgres 7.4.19
 and restore it on another virtual machine with same Postgres version 7.4.19.

 · Take file system level backup from  from first machine for
 Postgres database ‘testdb’ and restore it in another instance that has
 DIFFERENT version of Postgres.  i.e. backup the database testdb from
 Postgres 7.4.19 and restore it to another virtual machine with different
 Postgres version 8.4.3.



 I can achieve it with pg_dump and pg_restore, but not with file level
 backups.  The data directory is /var/lib/pgsql/data that contains base
 directory amongst others, that contains directories for each database
 referred to by their OIDs.  If I replace the complete data directory of one
 machine from the instance of another machine (with same or different
 Postgres version), It is failing to identify OIDs for that database.



 It should be possible in Postgres to swap two data directories in two
 different virtual machines without requiring pg_dump and pg_restore.  With
 me, it doesn’t work in both the cases mentioned above.  In first case, it
 gives an error of missing OID for the database.  In second case, it is
 giving version incompatibility issue.



 Is there a way in Postgres to do file system level backup?  The objective is
 to push /var/lib/pgsql/data directory across different virtual machines,
 without the need to backup and restore Postgres database with ‘pg_dump’ and
 ‘pg_restore’ commands.  Any help will be highly appreciated.

You definitely cannot take a filesystem level backup from one version
and throw it into a different version (perhaps minor versions, but not
7.x vs. 8.x).  This is basically what pg_upgrade was created to solve
(however with a different purpose).  But pg_upgrade definitely doesn't
support 7.x, and I'm not even sure that it supports 8.x.  In fact, I
don't even know that 7.x is a supported version of postgresql in any
context any longer.

As for the issue of moving a filesystem level backup between identical
versions, I believe that should work (although I have no clue if there
were bugs preventing this in a version as old as 7.x).  Can you
provide exact details  commands of what you're trying to do, and the
exact errors you're seeing?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] insert binary data into a table column with psql

2012-07-24 Thread Lonni J Friedman

On Tue, Jul 24, 2012 at 2:22 PM, John R Pierce pie...@hogranch.com wrote:
 On 07/24/12 1:28 PM, jkells wrote:

 from psql
 I have tried several ways including creating a function to read a file
 without any success but basically I want to do something like the
 following from a bash shell

 psql connection information -c insert into x (ID, load_date, image)
 values ($PID,clock_timestamp()::timestamp(0), copy from '/tmp/$FN' with
 binary);

 Any help would be greatly appreciated


 use a proper programming language that can read files and insert BYTEA data.
 shell + psql just won't cut it.   I'd suggest perl or python or java or

Actually, that's not true.  Its definitely possible to INSERT data
into bytea using just psql.  The trick is to sub- 'select' the data
with the bytea_import function in the INSERT.  So something like this:
insert into x (ID, load_date, image)
values ($PID,clock_timestamp()::timestamp(0), (SELECT bytea_import(
'/tmp/$FN'));

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] insert binary data into a table column with psql

2012-07-24 Thread Lonni J Friedman

On Tue, Jul 24, 2012 at 7:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
jtkells jtke...@verizon.net writes:
Thanks much for your reply, that does the trick quite nicely. But, I just
came to the realization that this only works if your are running the
client and the file both resides on the database server. I thought that
I would be able to do this from a remote server where the client was
running, picking up a local file and sending it into a remote database
table. Unless I am missing something, I cant. I can create a temp table
on this server, upload the file to a similar table then replicate it to
the targeted server.. Lots of work that could be easily done with a
programming language (as someone else posted) but thats what I have to
work with for now and I have no direct access (I.e. sftp ) to the
database server

Maybe use psql's \lo_import command to suck the data into a large
object on the server, and then use loread() to insert it into the
target table? (And don't forget to drop the large object after.)
Pretty grotty but I think it might be the only solution with the
currently available tools.

btw, does bytea_import actually exist? It's not in the core
server for sure.

Doh. No, its not standard, I found it here:
http://dba.stackexchange.com/questions/1742/how-to-insert-file-data-into-a-postgresql-bytea-column/2962#2962

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] big database resulting in small dump

2012-07-20 Thread Lonni J Friedman

On Fri, Jul 20, 2012 at 11:05 AM, Ilya Ivanov f...@ngs.ru wrote:
 I have a 8.4 database (installed on ubuntu 10.04 x86_64). It holds Zabbix
 database. The database on disk takes 10Gb. SQL dump takes only 2Gb. I've
 gone through
 http://archives.postgresql.org/pgsql-general/2008-08/msg00316.php and got
 some hints. Naturally, the biggest table is history (the second biggest is
 history_uint. Together they make about 95% of total size). I've tried to
 perform CLUSTER on it, but seemed to be taking forever (3 hours and still
 not completed). So I cancelled it and went with database drop and restore.
 It resulted in database taking up 6.4Gb instead of 10Gb. This is a good
 improvement, but still isn't quite what I expect. I would appreciate some
 clarification.

Its not entirely clear what behavior you expect here.  Assuming that
you're referring to running pg_dump, then you should just about never
expect the size of the resulting dump to be equal to the amount of
disk space the database server files consume on disk.  For example,
when I pg_dump a database that consumes about 290GB of disk, the
resulting dump is about 1.3GB.  This is normal  expected behavior.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] big database resulting in small dump

2012-07-20 Thread Lonni J Friedman

On Fri, Jul 20, 2012 at 11:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Lonni J Friedman netll...@gmail.com writes:
 On Fri, Jul 20, 2012 at 11:05 AM, Ilya Ivanov f...@ngs.ru wrote:
 I have a 8.4 database (installed on ubuntu 10.04 x86_64). It holds Zabbix
 database. The database on disk takes 10Gb. SQL dump takes only 2Gb.

 Its not entirely clear what behavior you expect here.  Assuming that
 you're referring to running pg_dump, then you should just about never
 expect the size of the resulting dump to be equal to the amount of
 disk space the database server files consume on disk.  For example,
 when I pg_dump a database that consumes about 290GB of disk, the
 resulting dump is about 1.3GB.  This is normal  expected behavior.

 The fine manual says someplace that databases are commonly about 5X the
 size of a plain-text dump, which is right in line with Ilya's results.
 Lonni's DB sounds a bit bloated :-(, though maybe he's got an atypically
 large set of indexes.

I do have a lot of indices.  Also, I'm using a lot of partitions, so
there are a relatively large number of tables.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] efficiency of wildcards at both ends

2012-06-20 Thread Lonni J Friedman

On Wed, Jun 20, 2012 at 10:10 AM, Sam Z J sammyjiang...@gmail.com wrote:
 Hi all

 I'm curious how is wildcards at both ends implemented, e.g. LIKE '%str%'
 How efficient is it if that's the only search criteria against a large
 table? how much does indexing the column help and roughly how much more
 space is needed for the index?

 if the answers are too long, please point me to the relavant text =D

My limited understanding is that any time you need to resort to using
wildcards, indices are never used, and you're falling back to using
the inefficient table scan.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_basebackup blocking all queries

2012-06-06 Thread Lonni J Friedman

I'm still plagued by this.  Immediately before starting a basebackup
the load on my server is 1.00 or less.  Within a few minutes of
starting the basebackup, the load climbs steadily to 30+ and anything
trying to write to the database just sits for minutes at a time, with
overall performance on any query (read or write) being horrible
(seconds to minutes).  As soon as the basebackup completes, perf
returns to normal (and the load drops back down to 1.00 or less).

How can I debug what's wrong?

On Tue, May 22, 2012 at 3:20 PM, Lonni J Friedman netll...@gmail.com wrote:
 Thanks for your reply.  Unfortunately, those queries don't shed any
 light no the problem.  The first two return 0 rows, and the third just
 returns 12 rows all associated with the query itself, rather than
 anything else.

 Any other suggestions?


 On Tue, May 22, 2012 at 2:56 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 Do the queries here help?

 http://wiki.postgresql.org/wiki/Lock_Monitoring

 On Tue, May 22, 2012 at 12:42 PM, Lonni J Friedman netll...@gmail.com 
 wrote:
 Greetings,
 I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
 replication to 3 hot standby servers).  All of them are running
 Fedora-16-x86_64.  Last Friday I upgraded the entire cluster from
 Fedora-15 with postgresql-9.0.6 to Fedora-16 with postgresql-9.1.3.

 I'm finding that I cannot runpg_basebackup at all, or it blocks all
 SQL queries from running until pg_basebackup has completed (and the
 load on the box just takes off to over 75.00).  By blocks I mean
 that any query that is submitted just hangs and does not return at all
 until pg_basebackup has stopped.   I'm assuming that this isn't
 expected behavior, so I'm rather confused on what is going on.  The
 command that I'm issuing is:
 pg_basebackup -v -D /mnt/backups/backups/tmp0 -x -Ft -U postgres

 Can someone provide some guidance on how to debug this?

 thanks!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] autovacuum running for a long time on a new table with 1 row

2012-06-01 Thread Lonni J Friedman

On Fri, Jun 1, 2012 at 10:34 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Lonni J Friedman netll...@gmail.com writes:
 Running 9.1.3 on Linux-x86_64.  I'm seeing autovacuum running for the
 past 6 hours on a newly created table that only has 1 row of data in
 it.  This table did exist previously, but was dropped  recreated.
 I'm not sure if that might explain this behavior.  When I strace the
 autovacuum process, I see the following scrolling by non-stop (with no
 changes to the file referenced):
 select(0, NULL, NULL, NULL, {0, 21000}) = 0 (Timeout)
 open(base/16412/214803_vm, O_RDWR)    = -1 ENOENT (No such file or 
 directory)
 open(base/16412/214803_vm, O_RDWR)    = -1 ENOENT (No such file or 
 directory)
 open(base/16412/214803_vm, O_RDWR)    = -1 ENOENT (No such file or 
 directory)
 open(base/16412/214803_vm, O_RDWR)    = -1 ENOENT (No such file or 
 directory)
 open(base/16412/214803_vm, O_RDWR)    = -1 ENOENT (No such file or 
 directory)
 open(base/16412/214803_vm, O_RDWR)    = -1 ENOENT (No such file or 
 directory)

 This seems to have been noticed and fixed in HEAD:
 http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=b4e0741727685443657b55932da0c06f028fbc00
 I wonder whether that should've been back-patched.

Thanks for your reply.  I won't even pretend to understand what that
fix does.  Is this behavior something that is blatantly broken, or
harmless, or somewhere in between?  Should I expect autovacuum to
eventually complete succesfully when it stumbles into this scenario?


 In the meantime, though, it sure looks like you've got a lot more than
 one row in there.  Perhaps you did umpteen zillion updates on that one
 row?

Before dropping  recreating the table, yes it had millions of rows,
and millions of updates.  But since then, all I did was insert a
single row, and watched autovacuum wedge itself in that seemingly
infinite loop.  I ended up doing a 'kill -2' on the autovacuum PID
that was misbehaving, disabled autovacuuming the table, and went about
what I needed to get done as an interim solution.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] autovacuum running for a long time on a new table with 1 row

2012-06-01 Thread Lonni J Friedman

On Fri, Jun 1, 2012 at 10:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Lonni J Friedman netll...@gmail.com writes:
On Fri, Jun 1, 2012 at 10:34 AM, Tom Lane t...@sss.pgh.pa.us wrote:
This seems to have been noticed and fixed in HEAD:
http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=b4e0741727685443657b55932da0c06f028fbc00
I wonder whether that should've been back-patched.

Thanks for your reply. I won't even pretend to understand what that
fix does. Is this behavior something that is blatantly broken, or
harmless, or somewhere in between? Should I expect autovacuum to
eventually complete succesfully when it stumbles into this scenario?

Well, the problem with the original code was that it would recheck the
visibility map's file size anytime somebody tried to check a bit beyond
the end of the map. If the map isn't there (which is not an error case)
this would result in a useless open() attempt for each table page
scanned by vacuum. So ordinarily I would say that yes you could expect
autovac to complete eventually. However ...

Before dropping recreating the table, yes it had millions of rows,
and millions of updates. But since then, all I did was insert a
single row, and watched autovacuum wedge itself in that seemingly
infinite loop. I ended up doing a 'kill -2' on the autovacuum PID
that was misbehaving, disabled autovacuuming the table, and went about
what I needed to get done as an interim solution.

... if you really did drop and recreate the table, then at this point
it should only have a single page, I would think. It might be worth
checking the actual file size. pg_relation_size('tablename') is
probably the quickest way.

Unfortunately, I've since inserted several million rows into this
table, so I'm guessing its too late now to check the size?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] autovacuum running for a long time on a new table with 1 row

2012-05-31 Thread Lonni J Friedman

Running 9.1.3 on Linux-x86_64.  I'm seeing autovacuum running for the
past 6 hours on a newly created table that only has 1 row of data in
it.  This table did exist previously, but was dropped  recreated.
I'm not sure if that might explain this behavior.  When I strace the
autovacuum process, I see the following scrolling by non-stop (with no
changes to the file referenced):
select(0, NULL, NULL, NULL, {0, 21000}) = 0 (Timeout)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
select(0, NULL, NULL, NULL, {0, 21000}) = 0 (Timeout)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)
open(base/16412/214803_vm, O_RDWR)= -1 ENOENT (No such file or directory)

Is this normal/expected?

thanks

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 - 9.1

2012-05-25 Thread Lonni J Friedman

On Thu, May 24, 2012 at 12:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Lonni J Friedman netll...@gmail.com writes:
 On Thu, May 24, 2012 at 12:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Can you correlate the performance hit with any specific part of
 autovacuum? In particular, I'm wondering if it matters whether vacuum
 is cleaning tables or indexes --- it alternates between the two, and the
 access patterns are a bit different. You could probably watch what the
 autovac process is doing with strace to see what it's accessing.

 Is there something specific I should be looking for in the strace
 output, or is this just a matter of correlating PID and FD to
 pg_class.relfilenode ?

 Nah, just match up the files it touches with pg_class.relfilenode.

Seems to be slower across the board regardless of what is being
vacuumed (tables or indices).

For example, i have a relatively small table (currently 395 rows,
rarely has any inserts or deletes).  vacuuming just the table (not
indices) takes on average 4s with the khugepaged defragmenter on, and
less than 1s with it off.  vacuuming just the indices takes on average
2s with the khugepaged defragmenter on, and less than 1s with it off.

The much larger tables (thousands to millions of rows), I see similar
performance (although they take even longer to complete with  without
the khugepaged defragmenter enabled).

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 - 9.1

2012-05-24 Thread Lonni J Friedman

On Wed, May 23, 2012 at 2:45 PM, Gavin Flower
gavinflo...@archidevsys.co.nz wrote:
 On 24/05/12 08:18, Lonni J Friedman wrote:

 On Wed, May 23, 2012 at 12:36 PM, Gavin Flower
 gavinflo...@archidevsys.co.nz wrote:

 On 24/05/12 05:09, Lonni J Friedman wrote:

 On Wed, May 23, 2012 at 9:37 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Lonni J Friedman netll...@gmail.com writes:

 After banging my head on the wall for  a long time, I happened to
 notice that khugepaged was consuming 100% CPU every time autovacuum
 was running.  I did:
 echo madvise  /sys/kernel/mm/transparent_hugepage/defrag
 and immediately the entire problem went away.

 Fascinating.

 In hindsight, sure.  Before that, it was 2 days of horror.

 So this looks like a nasty Fedora16 kernel bug to me, or maybe
 postgresql  Fedora16's default kernel settings are just not
 compatible?

 I agree, kernel bug.  What kernel version are you using exactly?

 I'm using the stock 3.3.5-2.fc16.x86_64 kernel that is in Fedora updates.

 Is anyone else using Fedora16  PostgreSQL-9.1 ?

 I use an F16 box daily, but can't claim to have done major performance
 testing with it.  Can you put together a summary of your nondefault
 Postgres settings?  I wonder whether it only kicks in for a certain
 size of shared memory for instance.

 Oh yea, I'm quite certain that this is somehow related to my setup,
 and not a generic problem with all F16/pgsql systems.  For starters,
 this problem isn't happening on any of the 3 standby systems, which
 are all otherwise identical to the master in every respect.  Also when
 we had done some testing (prior to the upgrades), we never ran into
 any of these problems.  However our test environment was on smaller
 scale hardware, with a much smaller number of clients (and overall
 load).

 Here are the non default settings in postgresql.conf :
 wal_level = hot_standby
 archive_mode = on
 archive_timeout = 61
 max_wal_senders = 10
 wal_keep_segments = 5000
 hot_standby = on
 log_autovacuum_min_duration = 2500
 autovacuum_max_workers = 4
 maintenance_work_mem = 1GB
 checkpoint_completion_target = 0.7
 effective_cache_size = 88GB
 work_mem = 576MB
 wal_buffers = 16MB
 checkpoint_segments = 64
 shared_buffers = 8GB
 max_connections = 350

 Let me know if you have any other questions.  I'd be happy to provide
 as much information as possible if it can aid in fixing this bug.

 I think they will need details of things like: RAM, number/type processors,
 number  type
 of disks, disk controllers  any other hardware specs that might be relevant
 etc.- at very
 least: total RAM  number of spindles

 16 core Xeon X5550 2.67GHz
 128GB RAM
 $PGDATA sits on a RAID5 array comprised of 3 SATA disks.  Its Linux's
 md software RAID.

 How does this compare to your other machines running the same, or similar,
 databases?
 However, you do say that the other machines are indentical - but are the
 other
 machines different in any aspect, that might prove siginificant?



 Also anything else running on the box.

 nothing else.  its dedicated exclusively to postgresql.

 Plus transaction load pattern - over time and read/write ratios.

 I'm not sure how I'd obtain this data.  however, the patterns didn't
 change since the upgrade.  If someone can point me in the right
 direction, I can at least obtain this data as its generated currently.

 type/nature of queries

 I need some clarification on specifically what you're asking for here.

 The complexity, structure, and features of the queries. Do you have lots of
 sub queries,
 and ORDER BY's? Also the number of tables accessed in a query. This is
 heading into the
 territory where others will be better placed to advise you as to what might
 be relevant!

No, not lots of subqueries or ORDERing, and most queries only touch a
single table.  However, I'm honestly not sure that I'm following where
you're going with this.   The problem isn't triggered by explicit
queries.  I can disable all external access, and simply wait for
autovacuum to kick off, and the box starts to die.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 - 9.1

2012-05-24 Thread Lonni J Friedman

On Thu, May 24, 2012 at 12:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Lonni J Friedman netll...@gmail.com writes:
 No, not lots of subqueries or ORDERing, and most queries only touch a
 single table.  However, I'm honestly not sure that I'm following where
 you're going with this.   The problem isn't triggered by explicit
 queries.  I can disable all external access, and simply wait for
 autovacuum to kick off, and the box starts to die.

 Can you correlate the performance hit with any specific part of
 autovacuum?  In particular, I'm wondering if it matters whether vacuum
 is cleaning tables or indexes --- it alternates between the two, and the
 access patterns are a bit different.  You could probably watch what the
 autovac process is doing with strace to see what it's accessing.

Is there something specific I should be looking for in the strace
output, or is this just a matter of correlating PID and FD to
pg_class.relfilenode ?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 - 9.1

2012-05-23 Thread Lonni J Friedman

Thanks for your reply.

On Tue, May 22, 2012 at 7:19 PM, Andy Colson a...@squeakycode.net wrote:
  On Mon, May 21, 2012 at 2:05 PM, Lonni J Friedmannetll...@gmail.com
  wrote:

 Greetings,

 When I got in this morning, I found
 an autovacuum process that had been running since just before the load
 spiked,


 Autovacuum might need to set the freeze bit very first time it runs.  I
 recall hearing advice about running a 'vacuum freeze' after you insert a
 huge amount of data.  And I recall pg_upgrade doesn't write stats, so did
 you analyze your database?

yes, I ran a 'vacuum analyze' for all databases  tables immediately
following completion of pg_upgrade.


 Or, maybe its not vacuum... maybe some of your sql statements are planning
 differently and running really bad.  Can you check some?  Can you log slow
 queries?

 Have you checked the status of your raid?  Maybe you lost a drive and its in
 recovery and you have very slow IO?

I checked that initially, but the array is fine.

After banging my head on the wall for  a long time, I happened to
notice that khugepaged was consuming 100% CPU every time autovacuum
was running.  I did:
echo madvise  /sys/kernel/mm/transparent_hugepage/defrag

and immediately the entire problem went away.  Load dropped within
minutes from 35.00 to 1.00, and has remained under 4.00 for the past
18 hours.  Prior to disabling defrag, I never saw the load below 10.00
for more than a few seconds at a time.

So this looks like a nasty Fedora16 kernel bug to me, or maybe
postgresql  Fedora16's default kernel settings are just not
compatible?

Is anyone else using Fedora16  PostgreSQL-9.1 ?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 - 9.1

2012-05-23 Thread Lonni J Friedman

On Wed, May 23, 2012 at 9:37 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Lonni J Friedman netll...@gmail.com writes:
 After banging my head on the wall for  a long time, I happened to
 notice that khugepaged was consuming 100% CPU every time autovacuum
 was running.  I did:
 echo madvise  /sys/kernel/mm/transparent_hugepage/defrag
 and immediately the entire problem went away.

 Fascinating.

In hindsight, sure.  Before that, it was 2 days of horror.


 So this looks like a nasty Fedora16 kernel bug to me, or maybe
 postgresql  Fedora16's default kernel settings are just not
 compatible?

 I agree, kernel bug.  What kernel version are you using exactly?

I'm using the stock 3.3.5-2.fc16.x86_64 kernel that is in Fedora updates.


 Is anyone else using Fedora16  PostgreSQL-9.1 ?

 I use an F16 box daily, but can't claim to have done major performance
 testing with it.  Can you put together a summary of your nondefault
 Postgres settings?  I wonder whether it only kicks in for a certain
 size of shared memory for instance.

Oh yea, I'm quite certain that this is somehow related to my setup,
and not a generic problem with all F16/pgsql systems.  For starters,
this problem isn't happening on any of the 3 standby systems, which
are all otherwise identical to the master in every respect.  Also when
we had done some testing (prior to the upgrades), we never ran into
any of these problems.  However our test environment was on smaller
scale hardware, with a much smaller number of clients (and overall
load).

Here are the non default settings in postgresql.conf :
wal_level = hot_standby
archive_mode = on
archive_timeout = 61
max_wal_senders = 10
wal_keep_segments = 5000
hot_standby = on
log_autovacuum_min_duration = 2500
autovacuum_max_workers = 4
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.7
effective_cache_size = 88GB
work_mem = 576MB
wal_buffers = 16MB
checkpoint_segments = 64
shared_buffers = 8GB
max_connections = 350

Let me know if you have any other questions.  I'd be happy to provide
as much information as possible if it can aid in fixing this bug.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 - 9.1

2012-05-23 Thread Lonni J Friedman

On Wed, May 23, 2012 at 12:36 PM, Gavin Flower
gavinflo...@archidevsys.co.nz wrote:
 On 24/05/12 05:09, Lonni J Friedman wrote:

 On Wed, May 23, 2012 at 9:37 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Lonni J Friedman netll...@gmail.com writes:

 After banging my head on the wall for  a long time, I happened to
 notice that khugepaged was consuming 100% CPU every time autovacuum
 was running.  I did:
 echo madvise  /sys/kernel/mm/transparent_hugepage/defrag
 and immediately the entire problem went away.

 Fascinating.

 In hindsight, sure.  Before that, it was 2 days of horror.

 So this looks like a nasty Fedora16 kernel bug to me, or maybe
 postgresql  Fedora16's default kernel settings are just not
 compatible?

 I agree, kernel bug.  What kernel version are you using exactly?

 I'm using the stock 3.3.5-2.fc16.x86_64 kernel that is in Fedora updates.

 Is anyone else using Fedora16  PostgreSQL-9.1 ?

 I use an F16 box daily, but can't claim to have done major performance
 testing with it.  Can you put together a summary of your nondefault
 Postgres settings?  I wonder whether it only kicks in for a certain
 size of shared memory for instance.

 Oh yea, I'm quite certain that this is somehow related to my setup,
 and not a generic problem with all F16/pgsql systems.  For starters,
 this problem isn't happening on any of the 3 standby systems, which
 are all otherwise identical to the master in every respect.  Also when
 we had done some testing (prior to the upgrades), we never ran into
 any of these problems.  However our test environment was on smaller
 scale hardware, with a much smaller number of clients (and overall
 load).

 Here are the non default settings in postgresql.conf :
 wal_level = hot_standby
 archive_mode = on
 archive_timeout = 61
 max_wal_senders = 10
 wal_keep_segments = 5000
 hot_standby = on
 log_autovacuum_min_duration = 2500
 autovacuum_max_workers = 4
 maintenance_work_mem = 1GB
 checkpoint_completion_target = 0.7
 effective_cache_size = 88GB
 work_mem = 576MB
 wal_buffers = 16MB
 checkpoint_segments = 64
 shared_buffers = 8GB
 max_connections = 350

 Let me know if you have any other questions.  I'd be happy to provide
 as much information as possible if it can aid in fixing this bug.

 I think they will need details of things like: RAM, number/type processors,
 number  type
 of disks, disk controllers  any other hardware specs that might be relevant
 etc.- at very
 least: total RAM  number of spindles

16 core Xeon X5550 2.67GHz
128GB RAM
$PGDATA sits on a RAID5 array comprised of 3 SATA disks.  Its Linux's
md software RAID.


 Also anything else running on the box.

nothing else.  its dedicated exclusively to postgresql.


 Plus transaction load pattern - over time and read/write ratios.

I'm not sure how I'd obtain this data.  however, the patterns didn't
change since the upgrade.  If someone can point me in the right
direction, I can at least obtain this data as its generated currently.


 type/nature of queries

I need some clarification on specifically what you're asking for here.


 size of heavily accessed tables and their indexes

there are several rather large tables (90 million+ rows), but most
others are under 1M rows.  However, most tables are accessed  written
to with equal frequency.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 - 9.1

2012-05-23 Thread Lonni J Friedman

On Wed, May 23, 2012 at 3:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Gavin Flower gavinflo...@archidevsys.co.nz writes:
 16 core Xeon X5550 2.67GHz
 128GB RAM
 $PGDATA sits on a RAID5 array comprised of 3 SATA disks.  Its Linux's
 md software RAID.

 How does this compare to your other machines running the same, or
 similar, databases?
 However, you do say that the other machines are indentical - but are the
 other
 machines different in any aspect, that might prove siginificant?

 I think Lonnie said that the other machines are just running standby
 clusters, which would mean they aren't running autovacuum as such,
 merely applying any WAL it produces.  So that could be plenty enough
 to explain a difference in kernel-visible behavior.

Yes, that is correct.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 - 9.1

2012-05-22 Thread Lonni J Friedman

No one has any ideas or suggestions, or even questions?  If someone
needs more information, I'd be happy to provide it.

This problem is absolutely killing me.

On Mon, May 21, 2012 at 2:05 PM, Lonni J Friedman netll...@gmail.com wrote:
 Greetings,
 I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
 replication to 3 hot standby servers).  All of them are running
 Fedora-16-x86_64.  Last Friday I upgraded the entire cluster from
 Fedora-15 with postgresql-9.0.6 to Fedora-16 with postgresql-9.1.3.  I
 made no changes to postgresql.conf following the upgrade.  I used
 pg_upgrade on the master to upgrade it, followed by blowing away
 $PGDATA on all the standbys and rsyncing them fresh from the master.
 All of the servers have 128GB RAM, and at least 16 CPU cores.

 Everything appeared to be working fine until last night when the load
 on the master suddenly took off, and hovered at around 30.00 ever
 since.  Prior to the load spike, the load was hovering around 2.00
 (which is actually lower than it was averaging prior to the upgrade
 when it was often around 4.00).  When I got in this morning, I found
 an autovacuum process that had been running since just before the load
 spiked, and the pg_dump cronjob that started shortly after the load
 spike (and normally completes in about 20 minutes for all the
 databases) was still running, and hadn't finished the first of the 6
 databases.  I ended up killing the pg_dump process altogether in the
 hope that it might unblock whatever was causing the high load.
 Unfortunately that didn't help, and the load continued to run high.

 I proceeded to check dmesg, /var/log/messages and the postgresql
 server log (all on the master), but I didn't spot anything out of the
 ordinary, definitely nothing that pointed to a potential explanation
 for all of the high load.

 I inspected what the autovacuum process was doing, and determined that
 it was chewing away on the largest table (nearly 98 million rows) in
 the largest database.  It was making very slow progress, at least I
 believe that was the case, as when I attached strace to the process,
 the seek addresses were changing in a random fashion.

 Here are the current autovacuum settings:
 autovacuum                      | on
 autovacuum_analyze_scale_factor | 0.1
 autovacuum_analyze_threshold    | 50
 autovacuum_freeze_max_age       | 2
 autovacuum_max_workers          | 4
 autovacuum_naptime              | 1min
 autovacuum_vacuum_cost_delay    | 20ms
 autovacuum_vacuum_cost_limit    | -1
 autovacuum_vacuum_scale_factor  | 0.2
 autovacuum_vacuum_threshold     | 50

 Did something significant change in 9.1 that would impact autovacuum
 behavior?  I'm at a complete loss on how to debug this, since I'm
 using the exact same settings now as prior to the upgrade.

 thanks

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] pg_basebackup blocking all queries

2012-05-22 Thread Lonni J Friedman

Greetings,
I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
replication to 3 hot standby servers).  All of them are running
Fedora-16-x86_64.  Last Friday I upgraded the entire cluster from
Fedora-15 with postgresql-9.0.6 to Fedora-16 with postgresql-9.1.3.

I'm finding that I cannot runpg_basebackup at all, or it blocks all
SQL queries from running until pg_basebackup has completed (and the
load on the box just takes off to over 75.00).  By blocks I mean
that any query that is submitted just hangs and does not return at all
until pg_basebackup has stopped.   I'm assuming that this isn't
expected behavior, so I'm rather confused on what is going on.  The
command that I'm issuing is:
pg_basebackup -v -D /mnt/backups/backups/tmp0 -x -Ft -U postgres

Can someone provide some guidance on how to debug this?

thanks!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_basebackup blocking all queries

2012-05-22 Thread Lonni J Friedman

Thanks for your reply.  Unfortunately, those queries don't shed any
light no the problem.  The first two return 0 rows, and the third just
returns 12 rows all associated with the query itself, rather than
anything else.

Any other suggestions?


On Tue, May 22, 2012 at 2:56 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 Do the queries here help?

 http://wiki.postgresql.org/wiki/Lock_Monitoring

 On Tue, May 22, 2012 at 12:42 PM, Lonni J Friedman netll...@gmail.com wrote:
 Greetings,
 I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
 replication to 3 hot standby servers).  All of them are running
 Fedora-16-x86_64.  Last Friday I upgraded the entire cluster from
 Fedora-15 with postgresql-9.0.6 to Fedora-16 with postgresql-9.1.3.

 I'm finding that I cannot runpg_basebackup at all, or it blocks all
 SQL queries from running until pg_basebackup has completed (and the
 load on the box just takes off to over 75.00).  By blocks I mean
 that any query that is submitted just hangs and does not return at all
 until pg_basebackup has stopped.   I'm assuming that this isn't
 expected behavior, so I'm rather confused on what is going on.  The
 command that I'm issuing is:
 pg_basebackup -v -D /mnt/backups/backups/tmp0 -x -Ft -U postgres

 Can someone provide some guidance on how to debug this?

 thanks!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] significant performance hit whenever autovacuum runs after upgrading from 9.0 - 9.1

2012-05-21 Thread Lonni J Friedman

Greetings,
I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
replication to 3 hot standby servers).  All of them are running
Fedora-16-x86_64.  Last Friday I upgraded the entire cluster from
Fedora-15 with postgresql-9.0.6 to Fedora-16 with postgresql-9.1.3.  I
made no changes to postgresql.conf following the upgrade.  I used
pg_upgrade on the master to upgrade it, followed by blowing away
$PGDATA on all the standbys and rsyncing them fresh from the master.
All of the servers have 128GB RAM, and at least 16 CPU cores.

Everything appeared to be working fine until last night when the load
on the master suddenly took off, and hovered at around 30.00 ever
since.  Prior to the load spike, the load was hovering around 2.00
(which is actually lower than it was averaging prior to the upgrade
when it was often around 4.00).  When I got in this morning, I found
an autovacuum process that had been running since just before the load
spiked, and the pg_dump cronjob that started shortly after the load
spike (and normally completes in about 20 minutes for all the
databases) was still running, and hadn't finished the first of the 6
databases.  I ended up killing the pg_dump process altogether in the
hope that it might unblock whatever was causing the high load.
Unfortunately that didn't help, and the load continued to run high.

I proceeded to check dmesg, /var/log/messages and the postgresql
server log (all on the master), but I didn't spot anything out of the
ordinary, definitely nothing that pointed to a potential explanation
for all of the high load.

I inspected what the autovacuum process was doing, and determined that
it was chewing away on the largest table (nearly 98 million rows) in
the largest database.  It was making very slow progress, at least I
believe that was the case, as when I attached strace to the process,
the seek addresses were changing in a random fashion.

Here are the current autovacuum settings:
autovacuum  | on
autovacuum_analyze_scale_factor | 0.1
autovacuum_analyze_threshold| 50
autovacuum_freeze_max_age   | 2
autovacuum_max_workers  | 4
autovacuum_naptime  | 1min
autovacuum_vacuum_cost_delay| 20ms
autovacuum_vacuum_cost_limit| -1
autovacuum_vacuum_scale_factor  | 0.2
autovacuum_vacuum_threshold | 50

Did something significant change in 9.1 that would impact autovacuum
behavior?  I'm at a complete loss on how to debug this, since I'm
using the exact same settings now as prior to the upgrade.

thanks

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] problems after restoring from a pg_basebackup

2012-04-27 Thread Lonni J Friedman

Greetings,
I'm running postgresql-9.1.3 on a Linux-x86_64 (Fedora16, if it
matters) system.  I noticed the existence of pg_basebackup starting in
9.1, and figured I'd try it out and see if it would simplify our
backup  management processes.

$ pg_basebackup -P -v -D /tmp/backup -x -Ft -z -U postgres
xlog start point: C6/6420
135733616/135733616 kB (100%), 1/1 tablespace
xlog end point: C6/64A0
pg_basebackup: base backup completed

So after running through this, I tried to use (restore) the backup
that was generated.  While everything appears to be working ok from a
functional perspective, in the server log I saw the following:
##
LOG:  creating missing WAL directory pg_xlog/archive_status
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at C6/6678
LOG:  could not open file pg_xlog/000100C60067 (log file
198, segment 103): No such file or directory
LOG:  redo done at C6/66A0
FATAL:  the database system is starting up
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
#

Just to be clear, here's what I did after pg_basebackup had completed
successfully:
0) shutdown postgresql gracefully, and verified that it was fully shutdown
1) moved $PGDATA to $PGDATA.old
2) created $PGDATA as postgres user
3) extracted the basebackup tarball as postgres user
cd $PGDATA  tar xzvpf /tmp/backup/base.tar.gz
4) started postgresql up

I would have expected that I wouldn't have gotten the 'not properly
shutdown' warning, or the 'could not open file' warning by following
this process.  Am I doing something wrong?

thanks

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_basebackup issues

2012-04-24 Thread Lonni J Friedman

On Fri, Apr 20, 2012 at 12:31 PM, Magnus Hagander mag...@hagander.net wrote:
 On Fri, Apr 20, 2012 at 19:51, Lonni J Friedman netll...@gmail.com wrote:
 Anyway, lesson learned, I need to either invoke pg_basebackup as the
 same user that runs the database (or is specified with the -U
 parameter ?), or write the backup somewhere outside of the directory
 structure that is being backed up.

 I eventually also found the following entries in the postgresql server log:
 FATAL:  could not open directory ./backups: Permission denied
 FATAL:  archive member backups/base.tar.gz too large for tar format

 What concerns me is the 2nd fatal error.  The tarball that
 pg_basebackup created before erroring out is about 12GB:
 12393094165  base.tar.gz

 Are you actually storing your backup files *inside* the data
 directory? You really shouldn't do that, you're creating a cyclic
 dependency where each new backup will include the old one inside it...
 You should store the resulting backup file somewhere outside the data
 directory.

yea, in hindsight that was silly.  i just saw a convenient 'backups'
subdirectory and figured that'd be a nice logical place.


 I wasn't aware of any 12GB file size limit for tar, so this is a bit
 of a mystery to me.  Regardless, I'd be happy to try some other
 archiving strategy, but the man page for pg_basebackup suggests that
 there are only two formats, tar and basically just copying the
 filesystem.  If I copied the filesystem, I'd still have to find some
 way to archive them for easy management (copying elsewhere, etc).  Has
 anyone come up with a good strategy on how to deal with it?

 The max file size of a single flie inside a standard tar file is 8Gb,
 see e.g. http://en.wikipedia.org/wiki/Tar_(file_format).

 I think there are extensions that let you store bigger files, but
 since PostgreSQL will never create files that  big it's not
 implemented in the basebackup system. Because again, the root of your
 problem seems to be that you are trying to store the resulting backup
 inside the data directory.

You're right, that was indeed the issue.  I've redone the process
using a location external to $PGDATA, and it completed successfully:
$ pg_basebackup -P -v -D /tmp/backup -x -Ft -z -U postgres
xlog start point: C6/6420
135733616/135733616 kB (100%), 1/1 tablespace
xlog end point: C6/64A0
pg_basebackup: base backup completed

So after running through this, I tried to use (restore) the backup
that was generated.  While everything appears to be working ok from a
functional perspective, in the server log I saw the following:
##
LOG:  creating missing WAL directory pg_xlog/archive_status
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at C6/6678
LOG:  could not open file pg_xlog/000100C60067 (log file
198, segment 103): No such file or directory
LOG:  redo done at C6/66A0
FATAL:  the database system is starting up
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
#

Just to be clear, here's what I did after pg_basebackup had completed
successfully:
0) shutdown postgresql gracefully, and verified that it was fully shutdown
1) moved $PGDATA to $PGDATA.old
2) created $PGDATA as postgres user
3) extracted the basebackup tarball as postgres user
cd $PGDATA  tar xzvpf /tmp/backup/base.tar.gz
4) started postgresql up

I would have expected that I wouldn't have gotten the 'not properly
shutdown' warning, or the 'could not open file' warning by following
this process.  Am I doing something wrong?

thanks

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] pg_basebackup issues

2012-04-20 Thread Lonni J Friedman

Greetings,
I'm running postgresql-9.1.3 on a Linux-x86_64 (Fedora16, if it
matters) system.  I noticed the existence of pg_basebackup starting in
9.1, and figured I'd try it out and see if it would simplify our
backup  management processes.  I setup a test system (same OS 
postgresql version as production) with a fairly recent snapshot of our
production database, invoked it, and saw the following output:
##
# pg_basebackup -P -v -D backups -Ft -z -U postgres
135717206/135717230 kB (100%), 1/1 tablespace
pg_basebackup: could not get WAL end position from server
##

I wasn't sure what that error meant, so after googling a bit, turns
out that it really means that there were one or more files not owned
by the postgres user (see
http://serverfault.com/questions/312205/pg-basebackup-could-not-get-wal-end-position-from-server
).  Sure enough, the file that wasn't owned by the postgres user was
the backup tarball that pg_basebackup was creating, since I had been
running it as root.  That error is rather cryptic, and it would be
helpful if it was improved to suggest the real cause of the failure.
Anyway, lesson learned, I need to either invoke pg_basebackup as the
same user that runs the database (or is specified with the -U
parameter ?), or write the backup somewhere outside of the directory
structure that is being backed up.

I eventually also found the following entries in the postgresql server log:
FATAL:  could not open directory ./backups: Permission denied
FATAL:  archive member backups/base.tar.gz too large for tar format

What concerns me is the 2nd fatal error.  The tarball that
pg_basebackup created before erroring out is about 12GB:
12393094165  base.tar.gz

I wasn't aware of any 12GB file size limit for tar, so this is a bit
of a mystery to me.  Regardless, I'd be happy to try some other
archiving strategy, but the man page for pg_basebackup suggests that
there are only two formats, tar and basically just copying the
filesystem.  If I copied the filesystem, I'd still have to find some
way to archive them for easy management (copying elsewhere, etc).  Has
anyone come up with a good strategy on how to deal with it?

thanks

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_upgrade + streaming replication ?

2012-03-20 Thread Lonni J Friedman

On Tue, Mar 20, 2012 at 11:46 AM, Bruce Momjian br...@momjian.us wrote:
 On Mon, Mar 19, 2012 at 03:07:02PM -0700, Jeff Davis wrote:
 On Mon, 2012-03-19 at 15:30 -0400, Bruce Momjian wrote:
  On Thu, Mar 01, 2012 at 02:01:31PM -0800, Lonni J Friedman wrote:
   I've got a 3 node cluster (1 master/2 slaves) running 9.0.x with
   streaming replication.  I'm in the planning stages of upgrading to
   9.1.x, and am looking into the most efficient way to do the upgrade
   with the goal of minimizing downtime  risk.  After googling, the only
   discussion that I've found of using pg_upgrade with a streaming
   replication setup seems to be this (nearly) year old thread:
   http://web.archiveorange.com/archive/v/9FNVlDWGQtpyWVL54jlK
  
   In summary, there is no way to use both pg_upgrade and streaming
   replication simultaneously.  I'd have to either use pg_upgrade and
   then effectively rebuild/redeploy the slaves, or not use pg_upgrade,
   and reimport all of the data.  Is that still the latest status, or are
   there other options?
 
  You can shut down all three servers, run pg_upgrade on all of them, then
  restart them as 9.1 servers.

 After running pg_upgrade on each server individually, they will have
 different system IDs, and potentially different on-disk representation
 of the catalogs, right?

 So how can you resume streaming without rebuilding the slaves?

 Oh, wow, I never thought of the fact that the system tables will be
 different?   I guess you could assume the pg_dump restore is going to
 create things exactly the same on all the systems, but I never tested
 that.  Do the system id's have to match?  That would be a problem
 because you are initdb'ing on each server.  OK, crazy idea, but I
 wonder if you could initdb on the master, then copy that to the slaves,
 then run pg_upgrade on each of them.  Obviously this needs some testing.

Wouldn't it be easier to just pg_upgrade the master, then setup the
slaves from scratch (with rsync, etc)?  It certainly wouldn't be any
more work to do it that way (although still a lot more work than
simply running pg_upgrade on all servers).

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_upgrade + streaming replication ?

2012-03-19 Thread Lonni J Friedman

On Mon, Mar 19, 2012 at 12:30 PM, Bruce Momjian br...@momjian.us wrote:
 On Thu, Mar 01, 2012 at 02:01:31PM -0800, Lonni J Friedman wrote:
 I've got a 3 node cluster (1 master/2 slaves) running 9.0.x with
 streaming replication.  I'm in the planning stages of upgrading to
 9.1.x, and am looking into the most efficient way to do the upgrade
 with the goal of minimizing downtime  risk.  After googling, the only
 discussion that I've found of using pg_upgrade with a streaming
 replication setup seems to be this (nearly) year old thread:
 http://web.archiveorange.com/archive/v/9FNVlDWGQtpyWVL54jlK

 In summary, there is no way to use both pg_upgrade and streaming
 replication simultaneously.  I'd have to either use pg_upgrade and
 then effectively rebuild/redeploy the slaves, or not use pg_upgrade,
 and reimport all of the data.  Is that still the latest status, or are
 there other options?

 You can shut down all three servers, run pg_upgrade on all of them, then
 restart them as 9.1 servers.


Thanks for your reply.  This is very good news.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] how to measure wal_buffer usage

2012-03-16 Thread Lonni J Friedman

On Fri, Mar 16, 2012 at 2:45 AM, Albe Laurenz laurenz.a...@wien.gv.at wrote:
 Lonni J Friedman wrote:
 After reading this interesting article on shared_buffers and wal_buffers:
 http://rhaas.blogspot.com/2012/03/tuning-sharedbuffers-and-walbuffers.html

 it got me wondering if my settings were ideal.  Is there some way to
 measure wal_buffer usage in real time, so that I could simply monitor
 it for some period of time, and then come up with a way of determining
 if the current setting is sufficient?

 I tried googling, but every reference that I've found simply defaults
 to the trial  error approach to performance tuning.

 You can use the contrib module pg_buffercache to inspect the shared buffers.
 If almost all your shared buffers have high use count (4 or 5),
 shared_buffers may be too small.  If not, consider reducing shared_buffers.

pg_buffercache only reports on the buffer_cache, it does not report
any data on the wal_cache.


 It's probably better to start with a moderate value and tune upwards.

 You can also look at pg_statio_all_tables and pg_statio_all_indexes and
 calculate the buffer hit ratio.  If that is low, that's also an indication
 that shared_buffers is too small.

Isn't this also specific to the buffer_cache rather than the wal_cache?


 You should distinguish between tables and indexes:
 it is usually more important that indexes are cached.

 Try to observe these things over time, for example by taking
 snapshots every n minutes and storing the results in a table.

 Yours,
 Laurenz Albe



-- 
~
L. Friedman                                    netll...@gmail.com
LlamaLand                       https://netllama.linux-sxs.org

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] how to measure wal_buffer usage

2012-03-15 Thread Lonni J Friedman

After reading this interesting article on shared_buffers and wal_buffers:
http://rhaas.blogspot.com/2012/03/tuning-sharedbuffers-and-walbuffers.html

it got me wondering if my settings were ideal.  Is there some way to
measure wal_buffer usage in real time, so that I could simply monitor
it for some period of time, and then come up with a way of determining
if the current setting is sufficient?

I tried googling, but every reference that I've found simply defaults
to the trial  error approach to performance tuning.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] pg_upgrade + streaming replication ?

2012-03-01 Thread Lonni J Friedman

I've got a 3 node cluster (1 master/2 slaves) running 9.0.x with
streaming replication.  I'm in the planning stages of upgrading to
9.1.x, and am looking into the most efficient way to do the upgrade
with the goal of minimizing downtime  risk.  After googling, the only
discussion that I've found of using pg_upgrade with a streaming
replication setup seems to be this (nearly) year old thread:
http://web.archiveorange.com/archive/v/9FNVlDWGQtpyWVL54jlK

In summary, there is no way to use both pg_upgrade and streaming
replication simultaneously.  I'd have to either use pg_upgrade and
then effectively rebuild/redeploy the slaves, or not use pg_upgrade,
and reimport all of the data.  Is that still the latest status, or are
there other options?

thanks

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] returning rows from an implicit JOIN where results either exist in both tables OR only one table

2011-12-01 Thread Lonni J Friedman

Greetings,
I've got a PostgreSQL-9.0.x database that manages an automated testing
environment.  There are a bunch of tables that contain assorted static
data (OS versions, test names, etc) named 'buildlist'  'osversmap'.
However, there are also two tables which contain data which changes
often.  The first is a 'pending' table which is effectively a test
queue where pending tests are self-selected by the test systems, and
then deleted when the test run has completed.  The second is a
'results' table which contains the test results as they are produced
(in progress and completed).

The records in the pending table have a one to many relationship with
the records in the results table (each row in pending can have 0 or
more rows in results).  For example, if no test systems have
self-assigned a pending row, then there will be zero associated rows
in results, and then once a pending row is assigned, the number of
rows in results will increase for each pending row.  An added catch is
that I always want only the newest results table row associated with
each pending table row.  What I need to do is query the 'pending'
table for pending tests, and then also get a 'logurl' from the results
table that corresponds to each pending table row.

All of this is rather similar to this problem, except that I have the
added burden of the two additional tables with the static data
(buildlist  osversmap):
http://stackoverflow.com/questions/3343857/php-sql-using-only-one-query-select-rows-from-two-tables-if-data-is-in-both-ta

I'm stumbling over how to integrate those two tables with static data
into the query.  The following query works fine as long as there's at
least one row in the 'results' table that corresponds to each row in
the pending table (however, it doesn't return anything for rows that
only exist in 'pending' yet not yet in 'results'):

SELECT
pending.cl,
pending.id,
pending.buildid,
pending.build_type,
pending.active,
pending.submittracker,
pending.os,pending.arch,
pending.osversion,
pending.branch,
pending.comment,
osversmap.osname,
buildlist.buildname,
results.logurl
FROM pending ,osversmap ,buildlist ,results
WHERE
pending.buildid=buildlist.id
AND pending.os=osversmap.os
AND pending.osversion=osversmap.osversion
AND pending.owner='$owner'
AND pending.completed='f'
AND results.hostname=pending.active
AND results.submittracker=pending.submittracker
AND pending.cl=results.cl
AND results.current_status!='PASSED'
AND results.current_status NOT LIKE '%FAILED'
ORDER BY pending.submittracker,pending.branch,pending.os,pending.arch


thanks in advance!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] returning rows from an implicit JOIN where results either exist in both tables OR only one table

2011-12-01 Thread Lonni J Friedman

On Thu, Dec 1, 2011 at 1:57 PM, David Johnston pol...@yahoo.com wrote:
 -Original Message-
 From: pgsql-general-ow...@postgresql.org 
 [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Lonni J Friedman
 Sent: Thursday, December 01, 2011 4:13 PM
 To: pgsql-general
 Subject: [GENERAL] returning rows from an implicit JOIN where results either 
 exist in both tables OR only one table


 I'm stumbling over how to integrate those two tables with static data into 
 the query.  The following query works fine as long as there's at least one 
 row in the 'results' table that corresponds to each row in the pending table 
 (however, it doesn't return anything for rows that only exist in 'pending' 
 yet not yet in 'results'):

 -

 Implicit JOINs are ALWAYS INNER JOINs

 Since you want to use an OUTER JOIN you must be explicit.

 I'm not going to try and figure out specifically what you need but from your 
 quick description (all pending and results where available) you need to do 
 something like

 pending LEFT OUTER JOIN results ON (pending.active = results.hostname 
 AND pending.submittracker = results.submittracker AND pending.cl = 
 results.cl)

 Then, for conditions dependent upon the results (or NULL-able) relation, 
 you need to make sure you explicitly allow for the missing rows:

 ( results.current_status IS NULL  OR ( your existing results conditions ) 
 )

 http://www.postgresql.org/docs/9.0/interactive/queries-table-expressions.html#QUERIES-FROM

Thanks for your reply and input.  I ended up putting together the
following query which does what I need:

SELECT  pending.cl,
pending.id,
pending.buildid,
pending.build_type,
pending.active,
pending.submittracker,
pending.os,
pending.arch,
pending.osversion,
pending.branch,
pending.comment,
osversmap.osname,
buildlist.buildname,
results.logurl
FROM pending
JOIN osversmap
ON ( pending.os = osversmap.os
AND pending.osversion = osversmap.osversion )
JOIN buildlist
ON ( pending.buildid = buildlist.id )
LEFT OUTER JOIN results
ON ( pending.active = results.hostname
AND pending.submittracker = results.submittracker
AND pending.cl = results.cl
AND results.current_status != 'PASSED'
AND results.current_status NOT LIKE '%FAILED'
)
WHERE pending.owner = '$owner'
AND pending.completed = 'f'
ORDER BY pending.submittracker,
pending.branch,
pending.os,
pending.arch

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] autovacuum stuck on a table for 18+ hours, consuming lots of CPU time

2011-11-22 Thread Lonni J Friedman

Greetings,
I'm running PostgreSQL-9.0.4 on a Linux-x86_64 cluster with 1 master,
and two streaming replication slaves.  Since late yesterday, the load
on the server has been noticably higher (5.00+) than normal (generally
under 1.00).  I investigated, and found that for the past ~18 hours,
there's one autovacuum process that has been running, and not making
any obvious progress:

select procpid,query_start,current_query from pg_stat_activity where
current_query LIKE 'autovacuum%' ;
 procpid |  query_start  |  current_query
-+---+-
  30188 | 2011-11-21 22:42:26.426315-08 | autovacuum: VACUUM
public.nppsmoketests
(1 row)

select c.oid,c.relname,l.pid,l.mode,l.granted from pg_class c join
pg_locks l on c.oid=l.relation where l.pid='30188' order by l.pid;
 oid  |  relname   |  pid  |   mode
   | granted
---++---+--+-
 72112 | nppsmoketests  | 30188 |
ShareUpdateExclusiveLock | t
 72617 | nppsmoketests_pkey | 30188 | RowExclusiveLock
   | t
 72619 | nppsmoketests_bug_idx  | 30188 | RowExclusiveLock
   | t
 72620 | nppsmoketests_active_idx   | 30188 | RowExclusiveLock
   | t
 72621 | nppsmoketests_arch_idx | 30188 | RowExclusiveLock
   | t
 72622 | nppsmoketests_branch_idx   | 30188 | RowExclusiveLock
   | t
 72623 | nppsmoketests_current_status_idx   | 30188 | RowExclusiveLock
   | t
 72624 | nppsmoketests_build_type_idx   | 30188 | RowExclusiveLock
   | t
 72625 | nppsmoketests_gpu_idx  | 30188 | RowExclusiveLock
   | t
 72626 | nppsmoketests_os_idx   | 30188 | RowExclusiveLock
   | t
 72627 | nppsmoketests_owner_idx| 30188 | RowExclusiveLock
   | t
 72628 | nppsmoketests_regressioncl_idx | 30188 | RowExclusiveLock
   | t
 72629 | nppsmoketests_subtest_idx  | 30188 | RowExclusiveLock
   | t
 72630 | nppsmoketests_suiteid_idx  | 30188 | RowExclusiveLock
   | t
 72631 | nppsmoketests_suiteid_testname_idx | 30188 | RowExclusiveLock
   | t
 72632 | nppsmoketests_testcl_idx   | 30188 | RowExclusiveLock
   | t
 72633 | nppsmoketests_testname_idx | 30188 | RowExclusiveLock
   | t
 80749 | nppsmoketests_osversion_idx| 30188 | RowExclusiveLock
   | t
(18 rows)

When I strace PID 30188, I see tons of this scrolling past quickly,
but I'm not really sure what it means beyond a 'Timeout' not looking
good:
select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
lseek(95, 753901568, SEEK_SET)  = 753901568
read(95, \202\1\0\0\260\315\250\245\1\0\0\0\220\0\360\20\360\37\4
\0\0\0\0p\237\0\1\360\236\0\1..., 8192) = 8192
lseek(95, 753917952, SEEK_SET)  = 753917952
read(95, \202\1\0\0 N\253\245\1\0\0\0\220\0\360\20\360\37\4
\0\0\0\0p\237\0\1\360\236\0\1..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
lseek(95, 768606208, SEEK_SET)  = 768606208
read(95, \204\1\0\0h!~\233\1\0\0\0\230\0\360\20\360\37\4
\0\0\0\0x\237\360\0\0\237\360\0..., 8192) = 8192
lseek(95, 753934336, SEEK_SET)  = 753934336
read(95, \202\1\0\0 \275\245\1\0\0\0\220\0\360\20\360\37\4
\0\0\0\0p\237\0\1\360\236\0\1..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
read(95, \202\1\0\0\10\33\276\245\1\0\0\0\220\0\360\20\360\37\4
\0\0\0\0p\237\0\1\360\236\0\1..., 8192) = 8192
lseek(95, 753958912, SEEK_SET)  = 753958912
read(95, \202\1\0\0x\317\307\245\1\0\0\0\220\0\360\20\360\37\4
\0\0\0\0p\237\0\1\360\236\0\1..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
lseek(95, 768614400, SEEK_SET)  = 768614400

An old thread suggests that this is a stuck spinlock:
http://archives.postgresql.org/pgsql-performance/2009-05/msg00455.php

I'm using the defaults for all the *vacuum* options in
postgresql.conf, except for:
log_autovacuum_min_duration = 2500

At this point, I'm not sure what I can safely do to either terminate
this autovacuum process, or kick it into making progress again?

thanks

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] autovacuum stuck on a table for 18+ hours, consuming lots of CPU time

2011-11-22 Thread Lonni J Friedman

On Tue, Nov 22, 2011 at 6:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Lonni J Friedman netll...@gmail.com writes:
 When I strace PID 30188, I see tons of this scrolling past quickly,
 but I'm not really sure what it means beyond a 'Timeout' not looking
 good:
 select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
 lseek(95, 753901568, SEEK_SET)          = 753901568
 read(95, \202\1\0\0\260\315\250\245\1\0\0\0\220\0\360\20\360\37\4
 \0\0\0\0p\237\0\1\360\236\0\1..., 8192) = 8192
 lseek(95, 753917952, SEEK_SET)          = 753917952
 read(95, \202\1\0\0 N\253\245\1\0\0\0\220\0\360\20\360\37\4
 \0\0\0\0p\237\0\1\360\236\0\1..., 8192) = 8192
 select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)

 I'm betting the selects are implementing vacuum_cost_delay, and that
 the reason this is taking forever is that you have that cranked up
 to an unreasonably high value.  There's no evidence of looping in
 what you showed us, because the seek addresses are changing.

Thanks for your prompt reply.  I was pretty sure that I was using the
default, but just to confirm, I just ran:
'SHOW vacuum_cost_delay;'

and it returned 0.Any other suggestions?

Is there no way to safely kill off this autovacuum process that
doesn't involve shutting down  restarting the database?

thanks

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

1 2 3 >

1 - 100 of 214 matches

Mail list logo