Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-29 Thread Kevin Grittner
 Heikki Linnakangas  wrote:
 On 28.12.2011 01:39, Simon Riggs wrote:
 On Tue, Dec 27, 2011 at 8:05 PM, Heikki Linnakangas
  wrote:
 On 25.12.2011 15:01, Kevin Grittner wrote:

 I don't believe that. Double-writing is a technique to avoid
 torn pages, but it requires a checksum to work. This chicken-
 and-egg problem requires the checksum to be implemented first.

 I don't think double-writes require checksums on the data pages
 themselves, just on the copies in the double-write buffers. In
 the double-write buffer, you'll need some extra information per-
 page anyway, like a relfilenode and block number that indicates
 which page it is in the buffer.
 
You are clearly right -- if there is no checksum in the page itself,
you can put one in the double-write metadata.  I've never seen that
discussed before, but I'm embarrassed that it never occurred to me.
 
 How would you know when to look in the double write buffer?

 You scan the double-write buffer, and every page in the double
 write buffer that has a valid checksum, you copy to the main
 storage. There's no need to check validity of pages in the main
 storage.
 
Right.  I'll recap my understanding of double-write (from memory --
if there's a material error or omission, I hope someone will correct
me).
 
The write-ups I've seen on double-write techniques have all the
writes to the double-write buffer (a single, sequential file that
stays around).  This is done as sequential writing to a file which is
overwritten pretty frequently, making the writes to a controller very
fast, and a BBU write-back cache unlikely to actually write to disk
very often.  On good server-quality hardware, it should be blasting
RAM-to_RAM very efficiently.  The file is fsync'd (like I said,
hopefully to BBU cache), then each page in the double-write buffer is
written to the normal page location, and that is fsync'd.  Once that
is done, the database writes have no risk of being torn, and the
double-write buffer is marked as empty.  This all happens at the
point when you would be writing the page to the database, after the
WAL-logging.

On crash recovery you read through the double-write buffer from the
start and write the pages which look good (including a good checksum)
to the database before replaying WAL.  If you find a checksum error
in processing the double-write buffer, you assume that you never got
as far as the fsync of the double-write buffer, which means you never
started writing the buffer contents to the database, which means
there can't be any torn pages there.  If you get to the end and
fsync, you can be sure any torn pages from a previous attempt to
write to the database itself have been overwritten with the good copy
in the double-write buffer.  Either way, you move on to WAL
processing.
 
You wind up with a database free of torn pages before you apply WAL.
full_page_writes to the WAL are not needed as long as double-write is
used for any pages which would have been written to the WAL.  If
checksums were written to the double-buffer metadata instead of
adding them to the page itself, this could be implemented alone.  It
would probably allow a modest speed improvement over using
full_page_writes and would eliminate those full-page images from the
WAL files, making them smaller.
 
If we do add a checksum to the page header, that could be used for
testing for torn pages in the double-write buffer without needing a
redundant calculation for double-write.  With no torn pages in the
actual database, checksum failures there would never be false
positives.  To get this right for a checksum in the page header,
double-write would need to be used for all cases where
full_page_writes now are used (i.e., the first write of a page after
a checkpoint), and for all unlogged writes (e.g., hint-bit-only
writes).  There would be no correctness problem for always using
double-write, but it would be unnecessary overhead for other page
writes, which I think we can avoid.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] failed regress test

2011-12-29 Thread Pavel Stehule
Hello

I checked yesterday HEAD and there is problem with create_index test

Regards

Pavel


regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-29 Thread Simon Riggs
On Wed, Dec 28, 2011 at 5:45 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 28.12.2011 11:22, Simon Riggs wrote:

 On Wed, Dec 28, 2011 at 7:42 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com  wrote:

 How would you know when to look in the double write buffer?



 You scan the double-write buffer, and every page in the double write
 buffer
 that has a valid checksum, you copy to the main storage. There's no need
 to
 check validity of pages in the main storage.


 OK, then we are talking at cross purposes. Double write buffers, in
 the way you explain them allow us to remove full page writes. They
 clearly don't do anything to check page validity on read. Torn pages
 are not the only fault we wish to correct against... and the double
 writes idea is orthogonal to the idea of checksums.


 The reason we're talking about double write buffers in this thread is that
 double write buffers can be used to solve the problem with hint bits and
 checksums.

Torn pages are not the only problem we need to detect.

You said You scan the double write buffer When exactly would you do that?

Please explain how a double write buffer detects problems that do not
occur as the result of a crash.

We don't have much time, so please be clear and lucid.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-29 Thread Kevin Grittner
 Heikki Linnakangas  wrote:
 Simon Riggs wrote:
 
 OK, then we are talking at cross purposes. Double write buffers,
 in the way you explain them allow us to remove full page writes.
 They clearly don't do anything to check page validity on read.
 Torn pages are not the only fault we wish to correct against...
 and the double writes idea is orthogonal to the idea of checksums.
 
 The reason we're talking about double write buffers in this thread
 is that double write buffers can be used to solve the problem with
 hint bits and checksums.
 
Exactly.  Every time the issue of page checksums is raised, there are
objections because OS or hardware crashes could cause torn pages for
hint-bit-only writes which would be treated as serious errors
(potentially indicating hardware failure) when they are in fact
expected and benign.  Some time before the thread dies, someone
generally points out that double-write technology would be a graceful
way to handle that, with the side benefit of smaller WAL files.  All
available evidence suggests it would also allow a small performance
improvement, although I hesitate to emphasize that aspect of it; the
other benefits fully justify the effort without that.
 
I do feel there is value in a page checksum patch even without torn
page protection.  The discussion on the list has convinced me that a
failed checksum should be treated as seriously as other page format
errors, rather than as a warning, even though (in the absence of torn
page protection) torn hint-bit-only page writes would be benign.

As an example of how this might be useful, consider our central
databases which contain all the detail replicated from the circuit
court databases in all the counties.  These are mission-critical, so
we have redundant servers in separate buildings.  At one point, one
of them experienced hardware problems and we started seeing invalid
pages.  Since we can shift the load between these servers without
down time, we moved all applications to other servers, and
investigated.  Now, it's possible that for some time before we got
errors on the bad pages, there could have been subtle corruption
which didn't generate errors but presented bad data on our web site.
A page checksum would help prevent that sort of problem, and a
post-crash false positive might waste a little time in investigation,
but that cost would be far outweighed by the benefit of better
accuracy guarantees.

Of course, it will be a big plus if we can roll this out in 9.2 in
conjunction with a double-write feature.  Not only will double-write
probably be a bit faster than full_page_writes in the WAL log, but it
will allow protection against torn pages on hint-bit-only writes
without adding those writes to the WAL or doing any major
rearrangement of where they sit that would break pg_upgrade.  It
would be nice not to have to put all sorts of caveats and
explanations into the docs about how a checksum error might be benign
due to hint bit writes.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] grants vs. inherited tables

2011-12-29 Thread Marko Kreen
On Thu, Dec 29, 2011 at 03:12:50PM -0300, Alvaro Herrera wrote:
 Excerpts from Marko Kreen's message of jue dic 29 15:04:49 -0300 2011:
  3) Have a way to format aclitem into something
 that can used to create GRANT statement easily.  Eg:
  
   pg_get_privilege_info(
IN priv aclitem,
OUT rolename text,
OUT privlist text[],
OUT privlist_with_grant_option text[]);
  
 This allows doing complex introspection in pl/pgsql
 and also helps tools that want to re-create table structure
 in other databases.
 
 aclexplode?

I guess that decides the name.  :)

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] grants vs. inherited tables

2011-12-29 Thread Alvaro Herrera

Excerpts from Marko Kreen's message of jue dic 29 15:04:49 -0300 2011:

 3) Have a way to format aclitem into something
that can used to create GRANT statement easily.  Eg:
 
  pg_get_privilege_info(
   IN priv aclitem,
   OUT rolename text,
   OUT privlist text[],
   OUT privlist_with_grant_option text[]);
 
This allows doing complex introspection in pl/pgsql
and also helps tools that want to re-create table structure
in other databases.

aclexplode?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/README

2011-12-29 Thread Alvaro Herrera

Excerpts from Dimitri Fontaine's message of mié dic 28 15:12:48 -0300 2011:
 Tom Lane t...@sss.pgh.pa.us writes:
  I wonder whether it's time to drop that file altogether ... it served a
  purpose back before we integrated contrib into the SGML docs, but now
  I'm not quite sure why we should bother with it.
 
 I wonder if we shouldn't keep the file and have it just point to the
 relevant documentation chapters (extend, contribs).

+1

Only start-scripts needs to be listed separately.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-29 Thread Noah Misch
On Thu, Dec 29, 2011 at 11:08:43AM -0600, Kevin Grittner wrote:
  Heikki Linnakangas  wrote:
  Simon Riggs wrote:
  
  OK, then we are talking at cross purposes. Double write buffers,
  in the way you explain them allow us to remove full page writes.
  They clearly don't do anything to check page validity on read.
  Torn pages are not the only fault we wish to correct against...
  and the double writes idea is orthogonal to the idea of checksums.
  
  The reason we're talking about double write buffers in this thread
  is that double write buffers can be used to solve the problem with
  hint bits and checksums.

 Of course, it will be a big plus if we can roll this out in 9.2 in
 conjunction with a double-write feature.  Not only will double-write
 probably be a bit faster than full_page_writes in the WAL log, but it
 will allow protection against torn pages on hint-bit-only writes
 without adding those writes to the WAL or doing any major
 rearrangement of where they sit that would break pg_upgrade.

[Thanks for your recent thread summaries.]

A double-write buffer, like a WAL-logged full-page image, is a technique for
performing atomic writes wider than those automatically provided by components
further down the storage stack.  The two strategies have different performance
characteristics, and we're told that a double-write buffer would better serve us
overall.  However, its benefits would not be *greater* for hint-only writes than
for any other write.  For that reason, I think we should consider these changes
independently.  With page checksums enabled, remove the hazard of torn hint-only
writes by ensuring that a WAL FPI has flushed since the last checkpoint.  When
necessary, emit an FPI-only record.  Separately, optimize first-since-checkpoint
writes by replacing FPIs with double-write buffers.  The double-write patch will
reduce the added WAL of the checksum/safe-hint-updates patch to zero.  If the
double-writes patch founders, we'll just have more-costly, yet equally reliable,
page checksums.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [RFC] grants vs. inherited tables

2011-12-29 Thread Marko Kreen
Hello,

I tried to generalize a function that creates partitions
for a table and found out it's impossible to do it for grants.

Basically, what I want is a child table that takes it's grants
from parent table.  IMHO quite reasonable request.  But I don't
see a way to do it in pl/pgsql.  (Writing parser in plpgsql
for aclitemout() output does not count.)

The form for the create statement is:

  CREATE TABLE part (
[pre-9.0]LIKE parent INCLUDING INDEXES INCLUDING CONSTRAINTS
[9.0+]   LIKE parent INCLUDING ALL-- skips grants
  ) INHERITS (parent);

Unless I'm missing something obvious, there is no way to take grants
from parent table.

My suggestions:

1) Have 'GRANTS' option for LIKE.  Seems obvious.

2) Include 'GRANTS' option in 'ALL'.  Also obvious.

3) Have a way to format aclitem into something
   that can used to create GRANT statement easily.  Eg:

 pg_get_privilege_info(
  IN priv aclitem,
  OUT rolename text,
  OUT privlist text[],
  OUT privlist_with_grant_option text[]);

   This allows doing complex introspection in pl/pgsql
   and also helps tools that want to re-create table structure
   in other databases.

Although 1)+2) and 3) seem like alternatives, I suggest doing all of them,
thus improving GRANT usage across the board.

Comments?

-- 
marko

NB: this mail is about designing and accepting TODO-items.
I might do them myself sometime, but I don't mind if anyone
implements them before me..


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] failed regress test

2011-12-29 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 I checked yesterday HEAD and there is problem with create_index test

What locale are you using?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] failed regress test

2011-12-29 Thread Pavel Stehule
Hello

LANG=cs_CZ.utf8

Regards

Pavel

2011/12/29 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 I checked yesterday HEAD and there is problem with create_index test

 What locale are you using?

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-12-29 Thread Heikki Linnakangas

On 29.12.2011 04:36, Manabu Ori wrote:

I believe lwarx hint would be no harm for recent PowerPC processors.
What I tested are:

   (1) Built postgres on POWER6 + RHEL5, which got lwarx hint
   included. Then copy these src tree to POWER5 + RHEL4 and
   run make test, finished successfully.

   (2) Lwarx test in configure failed on POWER5 + RHEL4.

Note that POWER6 understands lwarx hint and POWER5 doesn't.
RHEL5 binutils supports lwarx hint and RHEL4 binutils doesn't.

The only concern is for very old PowerPC.
Referring to Power Instruction Set Architecture manual(*1), on
some processors that precede PowerISA v2.00, executing lwarx with
hint will cause the illegal instruction error.

Lwarx test in configure should fail on these kind of processors,
guessing from my test(2).


The Linux kernel does this (arch/powerpc/include/asm/ppc-opcode.h):


127 /*
128  * Only use the larx hint bit on 64bit CPUs. e500v1/v2 based CPUs will 
treat a
129  * larx with EH set as an illegal instruction.
130  */
131 #ifdef CONFIG_PPC64
132 #define __PPC_EH(eh)(((eh)  0x1)  0)
133 #else
134 #define __PPC_EH(eh)0
135 #endif


We can't copy-paste code from Linux directly, and I'm not sure I like 
that particular phrasing of the macro, but perhaps we should steal the 
idea and only use the hint on 64-bit PowerPC processors? I presume all 
the processors that support the hint are 64-bit, so the question is, is 
there any 64-bit PowerPC processors that would get upset about it? It's 
quite arbitrary to tie it to the word length, but if it works as a 
dividing line in practice, I'm fine with it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ALTER DOMAIN DROP CONSTRAINT doesn't catch errors

2011-12-29 Thread Peter Eisentraut
Is there a secret reason why

ALTER DOMAIN foo DROP CONSTRAINT nonexistent;

doesn't report any error?

If not, I think we should add one and also add the usual IF EXISTS
option.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] get_fn_expr_argtype() vs. internal calls

2011-12-29 Thread Noah Misch
We document that a polymorphic C-language function may identify the concrete
data type of each argument using calls to get_fn_expr_argtype().  That relies
on FmgrInfo.fn_expr, which only the executor sets.  Calls of internal origin,
by way of {Direct,,Oid}FunctionCall*(), don't cons up an fn_expr, so
get_fn_expr_argtype() just returns InvalidOid every time.  (Indeed, we
couldn't easily do better in many cases.)  To what extent is it safe to rely
on this situation remaining as it is?

I ask on account of some second thoughts I had about CheckIndexCompatible().
When writing it, I did not explicitly consider operator classes having
polymorphic opcintype.  If get_fn_expr_argtype() were to work in a function
called from the btree search code, CheckIndexCompatible() should impose
stricter checks on indexes having opclasses of polymorphic opcintype.  If
that's not too likely to happen, I might just add a comment instead.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] failed regress test

2011-12-29 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 2011/12/29 Tom Lane t...@sss.pgh.pa.us:
 What locale are you using?

 LANG=cs_CZ.utf8

Fixed, thanks.  (It'd be nice if there were a buildfarm member checking
that locale, since it causes problems on a regular basis.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-12-29 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 The Linux kernel does this (arch/powerpc/include/asm/ppc-opcode.h):

Yeah, I was looking at that too.

 We can't copy-paste code from Linux directly, and I'm not sure I like 
 that particular phrasing of the macro, but perhaps we should steal the 
 idea and only use the hint on 64-bit PowerPC processors?

The info that I've found says that the hint exists beginning in POWER6,
and there were certainly 64-bit Power machines before that.  However,
it might be that the only machines that actually spit up on the hint bit
(rather than ignore it) were 32-bit, in which case this would be a
usable heuristic.  Not sure how we can research that ... do we want to
just assume the kernel guys know what they're doing?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SEGFAULT on SELECT * FROM view

2011-12-29 Thread chris r.
Hi all,

When SELECTing from one particular view, I get reproducible SEGFAULTs in
my pg 9.1.2 production database. To test for hardware errors, I ran
memtest, which succeeded. I then asked for help in #postgres and got
advice to create stacktraces, but I couldn't find the problem yet.

Scenario: VIEW 'vwa' on TABLE 'tba' and some subselects and aggregates.
I've no idea what's wrong with this view, but a

   SELECT * FROM vwa WHERE myid = 110 LIMIT 100 OFFSET 0;

and even EXPLAINing this statement SEGFAULTs the server.

gdb trace: http://pgsql.privatepaste.com/eacd1b6c5d
gdb ec output: http://pgsql.privatepaste.com/a61db5b564
smaps: http://pgsql.privatepaste.com/3c4f494015

Then I used the definition of vwa, and created vwb, and the very same
SELECT (just with vwb) works perfectly fine. I've a pastebin link
comparing the two view definitions, which I'm willing to share privately
if that helps. The views actually differ, although the look identical
with \d+ in the psql console, in that the newer view names more columns
that were added to the referenced tables lately.

So, you tell me, what's wrong with the old view?

Thanks a lot,
Chris

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SEGFAULT on SELECT * FROM view

2011-12-29 Thread Tom Lane
chris r. chri...@gmx.net writes:
 So, you tell me, what's wrong with the old view?

Nothing; that looks like a planner bug to me.  Please submit a
self-contained test case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SEGFAULT on SELECT * FROM view

2011-12-29 Thread chris r.
 So, you tell me, what's wrong with the old view?
 
 Nothing; that looks like a planner bug to me.  Please submit a
 self-contained test case.
I ported the entire schema to my test DB server and could not reproduce
the error there. Note that probably recreating the view solves this
issue. Given this, how should I proceed to create a test case? Any
tutorial on this? (I'm not too familiar with all this yet.)

Chris

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] failed regress test

2011-12-29 Thread Pavel Stehule
2011/12/29 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 2011/12/29 Tom Lane t...@sss.pgh.pa.us:
 What locale are you using?

 LANG=cs_CZ.utf8

 Fixed, thanks.  (It'd be nice if there were a buildfarm member checking
 that locale, since it causes problems on a regular basis.)


Thank you

I hope so we will have a server for buildfarm that we can use for
check with some specific locales. We have to sign some papers only.

Regards

Pavel

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-29 Thread Ants Aasma
On Thu, Dec 29, 2011 at 6:44 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 positives.  To get this right for a checksum in the page header,
 double-write would need to be used for all cases where
 full_page_writes now are used (i.e., the first write of a page after
 a checkpoint), and for all unlogged writes (e.g., hint-bit-only
 writes).  There would be no correctness problem for always using
 double-write, but it would be unnecessary overhead for other page
 writes, which I think we can avoid.

Unless I'm missing something, double-writes are needed for all writes,
not only the first page after a checkpoint. Consider this sequence of
events:

1. Checkpoint
2. Double-write of page A (DW buffer write, sync, heap write)
3. Sync of heap, releasing DW buffer for new writes.
 ... some time goes by
4. Regular write of page A
5. OS writes one part of page A
6. Crash!

Now recovery comes along, page A is broken in the heap with no
double-write buffer backup nor anything to recover it by in the WAL.

--
Ants Aasma

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SEGFAULT on SELECT * FROM view

2011-12-29 Thread Tom Lane
chris r. chri...@gmx.net writes:
 Nothing; that looks like a planner bug to me.  Please submit a
 self-contained test case.

 I ported the entire schema to my test DB server and could not reproduce
 the error there. Note that probably recreating the view solves this
 issue. Given this, how should I proceed to create a test case? Any
 tutorial on this? (I'm not too familiar with all this yet.)

It's possibly statistics-dependent; make sure you have the same stats
targets on both DBs, and try re-analyzing a few times.  Check other
planner parameters are the same, too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-29 Thread Nicolas Barbier
2011/12/30 Ants Aasma ants.aa...@eesti.ee:

 On Thu, Dec 29, 2011 at 6:44 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:

 positives.  To get this right for a checksum in the page header,
 double-write would need to be used for all cases where
 full_page_writes now are used (i.e., the first write of a page after
 a checkpoint), and for all unlogged writes (e.g., hint-bit-only
 writes).  There would be no correctness problem for always using
 double-write, but it would be unnecessary overhead for other page
 writes, which I think we can avoid.

 Unless I'm missing something, double-writes are needed for all writes,
 not only the first page after a checkpoint. Consider this sequence of
 events:

 1. Checkpoint
 2. Double-write of page A (DW buffer write, sync, heap write)
 3. Sync of heap, releasing DW buffer for new writes.
  ... some time goes by
 4. Regular write of page A
 5. OS writes one part of page A
 6. Crash!

 Now recovery comes along, page A is broken in the heap with no
 double-write buffer backup nor anything to recover it by in the WAL.

I guess the assumption is that the write in (4) is either backed by
the WAL, or made safe by double writing. ISTM that such reasoning is
only correct if the change that is expressed by the WAL record can be
applied in the context of inconsistent (i.e., partially written)
pages, which I assume is not the case (excuse my ignorance regarding
such basic facts).

So I think you are right.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] failed regress test

2011-12-29 Thread Andrew Dunstan



On 12/29/2011 05:06 PM, Tom Lane wrote:

Pavel Stehulepavel.steh...@gmail.com  writes:

2011/12/29 Tom Lanet...@sss.pgh.pa.us:

What locale are you using?

LANG=cs_CZ.utf8

Fixed, thanks.  (It'd be nice if there were a buildfarm member checking
that locale, since it causes problems on a regular basis.)




I have added it to the duty roster for nightjar. You don't need a 
separate buildfarm member, a single member can check multiple locales.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] backup_label during crash recovery: do we know how to solve it?

2011-12-29 Thread Daniel Farina
On Sat, Dec 3, 2011 at 8:04 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 At the moment, if the situation is ambiguous, the system assumes that you're
 restoring from a backup. What your suggestion amounts to is to reverse tht
 assumption, and assume instead that you're doing crash recovery on a system
 where a backup was being taken. In that case, if you take a backup with
 pg_base_backup(), and fail to archive the WAL files correctly, or forget to
 create a recovery.conf file, the database will happily start up from the
 backup, but is in fact corrupt. That is not good either.

Sorry for my lengthy time before getting around to writing a response,
but I do think there is, in practice, a way around this conundrum,
whose fundamental goal is to make sure that the backup is not, in
actuality, a full binary copy of the database.

A workaround that has a much smaller restart-hole is to move the backup_label in
and out of the database directory after having copied it to the
archive and before calling stop_backup.

How about this revised protocol (names and adjustments welcome), to
enable a less-terrible approach?  Not only is that workaround
incorrect (it has a small window where the system will not be able to
restart), but it's pretty inconvenient.

New concepts:

pg_prepare_backup: readies postgres for backing up.  Saves the
backup_label content in volatile memory.  The next start_backup will
write that volatile information to disk, and the information within
can be used to compute a backup-key

backup-key: a subset of the backup label, all it needs (as far as I
know) might be the database-id and then the WAL position (timeline,
seg, offset) the backup is starting at.

Protocol:

1. select pg_prepare_backup();
(Backup process remembers that backup-key is in progress (say, writes
it to /backup-keys/%k)
2. select pg_start_backup();
(perform copying)
3. select pg_stop_backup();
4. backup process can optionally clear its state remembering the
backup-key (rm /backup-keys/%k)

A crash at each point would be resolved this way:

Before step 1: Nothing has happened, so normal crash recovery.

Before step 2: (same, as it doesn't involve a state transition in postgres)

Before step 3: when the crash occurs and postgres starts up, postgres
asks the external software if a backup was in progress, say via a
backup-in-progress command.  It is responsible for looking at
/backup-keys/%k and saying yes, it was. The database can then do
normal crash recovery.  The backup can even be continuing through this
time, I think.

Before step 4: The archiver may leak the backup-key.  Because
backup-keys using the information I defined earlier have an ordering,
it should be possible to reap these if necessary at intervals.

Fundamentally, the way this approach gets around the 'physical copy'
conundrum is asking the archiver software to remember something well
out of the way of the database directory on the system that is being
backed up.

The main usability gain is that there will be a standardized way to
have postgres check to see if it was doing a backup (and thus should
use normal crash recovery) regardless of how it's started, rather than
hacks around, say, upstart scripts on ubuntu, or pg_ctl that are
idiosyncratic to what is a common need.

What do you think?  I think this may even be backwards compatible,
because if one doesn't call pg_prepare_backup then one can fall back
to that upon calling pg_start_backup.  The backup in progress
command is additive, and doesn't change anything for systems that do
not have it defined.

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] grants vs. inherited tables

2011-12-29 Thread Alvaro Herrera

Excerpts from Marko Kreen's message of jue dic 29 15:22:49 -0300 2011:
 
 On Thu, Dec 29, 2011 at 03:12:50PM -0300, Alvaro Herrera wrote:
  Excerpts from Marko Kreen's message of jue dic 29 15:04:49 -0300 2011:
   3) Have a way to format aclitem into something
  that can used to create GRANT statement easily.  Eg:
   
pg_get_privilege_info(
 IN priv aclitem,
 OUT rolename text,
 OUT privlist text[],
 OUT privlist_with_grant_option text[]);
   
  This allows doing complex introspection in pl/pgsql
  and also helps tools that want to re-create table structure
  in other databases.
  
  aclexplode?
 
 I guess that decides the name.  :)

I have the (hopefully wrong) impression that you're missing the fact
that it already exists, at least in 9.0.

I have a backported version of it we wrote for a customer, in case
you're interested on using it in previous releases.  Not that it's all
that difficult to write ...

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [NOVICE] index refuses to build

2011-12-29 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 On Thu, Dec 29, 2011 at 5:10 PM, Jean-Yves F. Barbier 12u...@gmail.com 
 wrote:
 CREATE INDEX tst1m_name_lu_ix ON tst1m(unaccent(name));
 ERROR:  functions in index expression must be marked IMMUTABLE

 your problem is the unaccent function.  it's defined stable because
 the rules function it depends on can change after the index is built
 -- that would effectively introduce index corruption.  it's possible
 to bypass that restriction, but are you sure that's what you want to
 do?

Hmm ... it's clear why unaccent(text) is only stable, because it depends
on the current search_path to find the unaccent dictionary.  But I
wonder whether it was an oversight that unaccent(regdictionary, text)
is stable and not immutable.  We don't normally mark functions as stable
just because you could in principle change their behavior by altering
some outside-the-database configuration files.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] spinlocks on powerpc

2011-12-29 Thread Manabu Ori
2011/12/30 Tom Lane t...@sss.pgh.pa.us
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  The Linux kernel does this (arch/powerpc/include/asm/ppc-opcode.h):

 Yeah, I was looking at that too.

  We can't copy-paste code from Linux directly, and I'm not sure I like
  that particular phrasing of the macro, but perhaps we should steal the
  idea and only use the hint on 64-bit PowerPC processors?

 The info that I've found says that the hint exists beginning in POWER6,
 and there were certainly 64-bit Power machines before that.  However,
 it might be that the only machines that actually spit up on the hint bit
 (rather than ignore it) were 32-bit, in which case this would be a
 usable heuristic.  Not sure how we can research that ... do we want to
 just assume the kernel guys know what they're doing?

I'm a bit confused and might miss the point, but...

If we can decide whether to use the hint operand when we build
postgres, I think it's better to check if we can compile and run
a sample code with lwarx hint operand than to refer to some
arbitrary defines, such as FOO_PPC64 or something.

I still wonder when to judge the hint availability, compile time
or runtime.
I don't have any idea how to decide that on runtime, though.

P.S.
I changed the subject since it's no longer related to HPUX.

Regards,
Manabu Ori