date:20110511

Re: [HACKERS] pg_dump --serializable-deferrable

2011-05-11 Thread Kevin Grittner

Peter Eisentraut  wrote:
 
 The name of this new option is a bit of a mouthful, and it mixes in
 an otherwise standardized term (deferrable, as in constraints) with
 transaction isolation. Wouldn't something like
 --wait-for-serializable be clearer (and shorter)?
 
I see it's not mentioned in the description of the pg_dump option,
but the option name is based on the new (PostgreSQL-specific)
DEFERRABLE transaction property which is used when the option is
specified.
 
See if it makes more sense after reading this page:
 
http://developer.postgresql.org/pgdocs/postgres/sql-set-transaction.html
 
Personally, I think DEFERRABLE is a good name for the transaction
property; it conveys the right semantics and avoids adding a new
reserved word.  The question of what to name it was first raised
almost eight months ago, and it has been discussed many times since.
 
http://search.postgresql.org/search?q=kevin+serializable+deferrablem=1l=1d=365s=d
 
I'm less concerned about the pg_dump name, if you think something
else is clearer; although this one isn't the longest pg_dump option
we have.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Simon Riggs

On Wed, May 11, 2011 at 12:54 AM, Greg Stark gsst...@mit.edu wrote:

 On a separate note though, Simon, I don't know what you mean by we
 normally start with a problem. It's an free software project and
 people are free to work on whatever interests them whether that's
 because it solves a problem they have, helps a client who's paying
 them, or just because it's of academic interest to them. We don't
 always take their patches if they aren't of general interest but
 people propose all kinds of crazy experimental ideas all the time.

Completely agree, but why are you saying that to me?

When Tom asks me why I suggest something, nobody tells him its a free
software project etc

What is the difference here?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Feature proposal: distinguish each PostgreSQL instance in the event log

2011-05-11 Thread Dave Page

On Tue, May 10, 2011 at 11:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 MauMau maumau...@gmail.com writes:
 MauMau maumau...@gmail.com writes:
 I've encountered one problem on Windows. I need to support running all of
 my
 products on one host simultaneously. Plus, I need to log messages in
 syslog/event log. On Linux, I can distinguish the messages of one product
 and those of other products by setting syslog_ident in postgresql.conf.
 On
 the other hand, I can't do that on Windows: all of the PostgreSQL
 instances
 use the same fixed event source name PostgreSQL.

 SO I'd like to propose a trivial feature which allows users to set event
 source.

 BTW, what will this accomplish exactly that couldn't be accomplished by
 setting log_line_prefix to include the desired identifier?

Windows uses the event source field to show where events in the logs
have come from. The Event Viewer and other network management tools
expect to be able to use the field for sorting and filtering etc, so
having a different value for different PG instances allows those tools
to distinguish between them properly. It's also useful for ISVs who
may want to make an embedded PG instance used by their apps identify
itself differently from other instances.


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign table permissions and cloning

2011-05-11 Thread Shigeru Hanada

(2011/04/26 5:42), Robert Haas wrote:
 OK.  Turned out a little more cleanup was needed to make this all the
 way consistent with how we handle views; I have now done that.

I noticed that some fixes would be needed for consistency about foreign
table privileges. Attached patch includes fixes below:

1) psql doesn't complete FOREIGN TABLE after GRANT/REVOKE.
2) pg_dump uses GRANT .. ON TABLE for foreign tables, instead of ON
FOREIGN TABLE.
3) GRANT document mentions that ALL TABLES includes foreign tables too.
4) Rows of information_schema.foreign_tables/foreign_table_options are
visible to users who have any privileges on the foreign table.

Regards,
-- 
Shigeru Hanada
diff --git a/doc/src/sgml/ref/grant.sgml b/doc/src/sgml/ref/grant.sgml
index 93e8332..689aba5 100644
*** a/doc/src/sgml/ref/grant.sgml
--- b/doc/src/sgml/ref/grant.sgml
*** GRANT replaceable class=PARAMETERrol
*** 101,107 
 There is also an option to grant privileges on all objects of the same
 type within one or more schemas.  This functionality is currently supported
 only for tables, sequences, and functions (but note that literalALL
!TABLES/ is considered to include views).
/para
  
para
--- 101,107 
 There is also an option to grant privileges on all objects of the same
 type within one or more schemas.  This functionality is currently supported
 only for tables, sequences, and functions (but note that literalALL
!TABLES/ is considered to include views and foreign tables).
/para
  
para
diff --git a/src/backend/catalog/information_schema.sql 
b/src/backend/catalog/information_schema.sql
index c623fb7..452a0ea 100644
*** a/src/backend/catalog/information_schema.sql
--- b/src/backend/catalog/information_schema.sql
*** CREATE VIEW _pg_foreign_tables AS
*** 2557,2564 
  WHERE w.oid = s.srvfdw
AND u.oid = c.relowner
AND (pg_has_role(c.relowner, 'USAGE')
!OR has_table_privilege(c.oid, 'SELECT')
!OR has_any_column_privilege(c.oid, 'SELECT'))
AND n.oid = c.relnamespace
AND c.oid = t.ftrelid
AND c.relkind = 'f'
--- 2557,2564 
  WHERE w.oid = s.srvfdw
AND u.oid = c.relowner
AND (pg_has_role(c.relowner, 'USAGE')
!OR has_table_privilege(c.oid, 'SELECT, INSERT, UPDATE, DELETE, 
TRUNCATE, REFERENCES, TRIGGER')
!OR has_any_column_privilege(c.oid, 'SELECT, INSERT, UPDATE, 
REFERENCES'))
AND n.oid = c.relnamespace
AND c.oid = t.ftrelid
AND c.relkind = 'f'
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 30366d2..e474a69 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*** dumpTable(Archive *fout, TableInfo *tbin
*** 11804,11810 
namecopy = strdup(fmtId(tbinfo-dobj.name));
dumpACL(fout, tbinfo-dobj.catId, tbinfo-dobj.dumpId,
(tbinfo-relkind == RELKIND_SEQUENCE) ? 
SEQUENCE :
-   (tbinfo-relkind == RELKIND_FOREIGN_TABLE) ? 
FOREIGN TABLE :
TABLE,
namecopy, NULL, tbinfo-dobj.name,
tbinfo-dobj.namespace-dobj.name, 
tbinfo-rolname,
--- 11804,11809 
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 3ef2fa4..02f4aea 100644
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
*** psql_completion(char *text, int start, i
*** 2234,2240 
UNION 
SELECT 'DATABASE'
UNION 
SELECT 'FOREIGN DATA WRAPPER'
UNION 
SELECT 'FOREIGN SERVER'
-   UNION 
SELECT 'FOREIGN TABLE'
UNION 
SELECT 'FUNCTION'
UNION 
SELECT 'LANGUAGE'
UNION 
SELECT 'LARGE OBJECT'
--- 2234,2239 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Simon Riggs

On Wed, May 11, 2011 at 1:47 AM, Bruce Momjian br...@momjian.us wrote:
 Greg Stark wrote:
 On a separate note though, Simon, I don't know what you mean by we
 normally start with a problem. It's an free software project and
 people are free to work on whatever interests them whether that's
 because it solves a problem they have, helps a client who's paying
 them, or just because it's of academic interest to them. We don't
 always take their patches if they aren't of general interest but
 people propose all kinds of crazy experimental ideas all the time.

 I am confused by Simon's questions too.

 Simon seems to regularly argue for adding features late in the
 development cycle and backpatch things no one else thinks should be
 backpatched, but he wants more research that index-only scans are going
 to improve things before it is implemented?   The first is aggressive
 development, the second is very conservative development --- they don't
 match, so I now wonder what the motivation is since it isn't consistent.

Not really sure why reasonable technical skepticism should become
personal commentary.

You don't question Tom's motives if he is skeptical of an idea of
mine. Why would you question my motivation? What is *your* motive for
acting like that?

I'm not driven by one setting of conservatism, but I am interested
in adding fully usable features that bring credit to the project. If I
see a feature that can have minor things added to it to improve them,
then I raise that during beta. If I see things being worked out that
sounds dubious, I mention that in early development.

I don't think this work will materially improve the speed of count(*)
in majority of cases. This introduces extra overhead into the code
path and that can be a net loss. The only time it will help is when
you have a large table that is not cached and also not recently
updated. Is count(*) run very often against such tables? Do we really
care enough to optimise that use case with lots of special purpose
code? The very fact that Kevin and yourself bring up different reasons
for why we need this feature makes me nervous.

The analysis has not been done yet, and all I have done is request that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Simon Riggs

On Wed, May 11, 2011 at 2:34 AM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 So, what do we need in order to find our way to index-only scans?

 1. The visibility map needs to be crash-safe.  The basic idea of
 index-only scans is that, instead of checking the heap to find out
 whether each tuple is visible, we first check the visibility map.  If
 the visibility map bit is set, then we know all tuples on the page are
 visible to all transactions, and therefore the tuple of interest is
 visible to our transaction.  Assuming that a significant number of
 visibility map bits are set, this should enable us to avoid a fair
 amount of I/O, especially on large tables, because the visibility map
 is roughly 8000 times smaller than the heap, and therefore far more
 practical to keep in cache.  However, before we can rely on the

 FYI, because the visibility map is only one _bit_ per page, it is 8000 *
 8 or 64k times smaller than the heap, e.g. one 8k page covers 64MB of
 heap pages.  This is important because we rely on this compactness in
 hope that the WAL logging of this information will not be burdensome.

We would need to issue one WAL record per bit, not per page.

I'm concerned about the path length added by VM visits and the
potential contention that concentration of information will bring.

Those aren't things to be dismissed without calculation and analysis.
There might not be an issue there, but its worth checking.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] potential bug in trigger with boolean params

2011-05-11 Thread Szymon Guz

Hi,
I was trying to create a trigger with parameters. I've found a potential bug
when the param is boolean.

Here is code replicating the bug:

CREATE TABLE x(x TEXT);

CREATE OR REPLACE FUNCTION trigger_x() RETURNS TRIGGER AS $$
BEGIN
RETURN NEW;
END; $$ LANGUAGE PLPGSQL;

CREATE TRIGGER trig_x_text BEFORE INSERT ON x FOR EACH ROW EXECUTE PROCEDURE
trigger_x('text');
CREATE TRIGGER trig_x_int BEFORE INSERT ON x FOR EACH ROW EXECUTE PROCEDURE
trigger_x(10);
CREATE TRIGGER trig_x_float BEFORE INSERT ON x FOR EACH ROW EXECUTE
PROCEDURE trigger_x(42.0);
CREATE TRIGGER trig_x_bool BEFORE INSERT ON x FOR EACH ROW EXECUTE PROCEDURE
trigger_x(true);

ERROR:  syntax error at or near true
LINE 1: ... INSERT ON x FOR EACH ROW EXECUTE PROCEDURE trigger_x(true);


I've already checked that on:
'PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 32-bit'
'PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit'


If this is intended behavior, then the documentation doesn't say anything
about that.
The only information is that:

TG_ARGV[]

Data type array of text; the arguments from the CREATE TRIGGER statement.
but the below line throws the same error:

CREATE TRIGGER trig_x_bool BEFORE INSERT ON x FOR EACH ROW EXECUTE PROCEDURE
trigger_x( true::text );

while this obviously works:

SELECT true::text;

and this works as well:

CREATE TRIGGER trig_x_bool BEFORE INSERT ON x FOR EACH ROW EXECUTE PROCEDURE
trigger_x('true');
regards
Szymon

Re: [HACKERS] Process wakeups when idle and power consumption

2011-05-11 Thread Peter Geoghegan

On 9 May 2011 11:19, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

 In the child, spawn a thread

How exactly should I go about this? The one place in the code that I
knew to use multiple threads, pgbench, falls back on emulation with
fork() on some platforms.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Heikki Linnakangas


On 10.05.2011 20:15, Simon Riggs wrote:

On Tue, May 10, 2011 at 5:17 PM, Kevin Grittner
kevin.gritt...@wicourts.gov  wrote:

Simon Riggssi...@2ndquadrant.com  wrote:


This topic has been discussed many times, yet I have never seen an
assessment that explains WHY we would want to do index-only scans.


In databases with this feature, it's not too unusual for a query
which uses just an index to run one or more orders of magnitude
faster than a query which has to randomly access the heap for each
index entry.  That seems like enough evidence of its possible value
in PostgreSQL to proceed to the point where benchmarks become
possible.  I'm assuming that, like all other features added as
performance optimizations, it won't be committed until there are
benchmarks showing the net benefit.

As a thought experiment, picture the relative costs of scanning a
portion of an index in index sequence, and being done, versus
scanning a portion of an index in index sequence and jumping to a
random heap access for each index entry as you go.


I can picture that. Regrettably, I can also picture the accesses to
the visibility map, the maintenance operations on the VM that are
needed for this and the contention that both of those will cause.


Note that we already have the visibility map, and the accesses needed to 
update it are already there. Granted, we'll have to change the logic 
slightly to make it crash safe, but I don't expect that to add any 
meaningful overhead - the changes are going to be where the bits are 
set, ie. vacuum, not when the bits are cleared. Granted, we might also 
want to set the bits more aggressively once they're used by 
index-only-scans. But done correctly, just taking advantage of the VM 
that's already there shouldn't add overhead to other operations.


I agree that we need to do tests to demonstrate that there's a gain from 
the patch, once we have a patch to test. I would be very surprised if 
there isn't, but that just means the testing is going to be easy.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Process wakeups when idle and power consumption

2011-05-11 Thread Magnus Hagander

On Wed, May 11, 2011 at 10:52, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 9 May 2011 11:19, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:

 In the child, spawn a thread

 How exactly should I go about this? The one place in the code that I
 knew to use multiple threads, pgbench, falls back on emulation with
 fork() on some platforms.

If you're doing this Win32 specific, take a look at
src/backend/port/win32/signal.c for an example.

If you're not doing this win32-specific, I doubt we really want
threads to be involved...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] potential bug in trigger with boolean params

2011-05-11 Thread tv

 Hi,
 I was trying to create a trigger with parameters. I've found a potential
 bug
 when the param is boolean.

 Here is code replicating the bug:

 CREATE TABLE x(x TEXT);

 CREATE OR REPLACE FUNCTION trigger_x() RETURNS TRIGGER AS $$
 BEGIN
 RETURN NEW;
 END; $$ LANGUAGE PLPGSQL;

 CREATE TRIGGER trig_x_text BEFORE INSERT ON x FOR EACH ROW EXECUTE
 PROCEDURE
 trigger_x('text');
 CREATE TRIGGER trig_x_int BEFORE INSERT ON x FOR EACH ROW EXECUTE
 PROCEDURE
 trigger_x(10);
 CREATE TRIGGER trig_x_float BEFORE INSERT ON x FOR EACH ROW EXECUTE
 PROCEDURE trigger_x(42.0);
 CREATE TRIGGER trig_x_bool BEFORE INSERT ON x FOR EACH ROW EXECUTE
 PROCEDURE
 trigger_x(true);

 ERROR:  syntax error at or near true
 LINE 1: ... INSERT ON x FOR EACH ROW EXECUTE PROCEDURE trigger_x(true);

The docs clearly state what the valid values are and the literal 'true' is
not one of them (TRUE is). See this:

http://www.postgresql.org/docs/9.0/interactive/datatype-boolean.html

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] potential bug in trigger with boolean params

2011-05-11 Thread Szymon Guz

On 11 May 2011 10:56, t...@fuzzy.cz wrote:

  Hi,
  I was trying to create a trigger with parameters. I've found a potential
  bug
  when the param is boolean.
 
  Here is code replicating the bug:
 
  CREATE TABLE x(x TEXT);
 
  CREATE OR REPLACE FUNCTION trigger_x() RETURNS TRIGGER AS $$
  BEGIN
  RETURN NEW;
  END; $$ LANGUAGE PLPGSQL;
 
  CREATE TRIGGER trig_x_text BEFORE INSERT ON x FOR EACH ROW EXECUTE
  PROCEDURE
  trigger_x('text');
  CREATE TRIGGER trig_x_int BEFORE INSERT ON x FOR EACH ROW EXECUTE
  PROCEDURE
  trigger_x(10);
  CREATE TRIGGER trig_x_float BEFORE INSERT ON x FOR EACH ROW EXECUTE
  PROCEDURE trigger_x(42.0);
  CREATE TRIGGER trig_x_bool BEFORE INSERT ON x FOR EACH ROW EXECUTE
  PROCEDURE
  trigger_x(true);
 
  ERROR:  syntax error at or near true
  LINE 1: ... INSERT ON x FOR EACH ROW EXECUTE PROCEDURE trigger_x(true);

 The docs clearly state what the valid values are and the literal 'true' is
 not one of them (TRUE is). See this:

 http://www.postgresql.org/docs/9.0/interactive/datatype-boolean.html

 regards
 Tomas


Well... no.

In the link you've provided there is something different:


Valid literal values for the true state are:

TRUE't''true''y''yes''on''1'

so I could use 'true'... and this doesn't work.

And SQL is not case sensitive... but I will check it for you anyway:

CREATE TRIGGER trig_x_2 BEFORE INSERT ON x FOR EACH ROW EXECUTE PROCEDURE
trigger_x(TRUE);

ERROR:  syntax error at or near TRUE
LINE 1: ... INSERT ON x FOR EACH ROW EXECUTE PROCEDURE trigger_x(TRUE);

regards
Szymon

Re: [HACKERS] potential bug in trigger with boolean params

2011-05-11 Thread Andreas Joseph Krogh

På onsdag 11. mai 2011 kl 10:56:19 skrev t...@fuzzy.cz:
  Hi,
  I was trying to create a trigger with parameters. I've found a potential
  bug
  when the param is boolean.
 
  Here is code replicating the bug:
 
  CREATE TABLE x(x TEXT);
 
  CREATE OR REPLACE FUNCTION trigger_x() RETURNS TRIGGER AS $$
  BEGIN
  RETURN NEW;
  END; $$ LANGUAGE PLPGSQL;
 
  CREATE TRIGGER trig_x_text BEFORE INSERT ON x FOR EACH ROW EXECUTE
  PROCEDURE
  trigger_x('text');
  CREATE TRIGGER trig_x_int BEFORE INSERT ON x FOR EACH ROW EXECUTE
  PROCEDURE
  trigger_x(10);
  CREATE TRIGGER trig_x_float BEFORE INSERT ON x FOR EACH ROW EXECUTE
  PROCEDURE trigger_x(42.0);
  CREATE TRIGGER trig_x_bool BEFORE INSERT ON x FOR EACH ROW EXECUTE
  PROCEDURE
  trigger_x(true);
 
  ERROR:  syntax error at or near true
  LINE 1: ... INSERT ON x FOR EACH ROW EXECUTE PROCEDURE trigger_x(true);
 
 The docs clearly state what the valid values are and the literal 'true' is
 not one of them (TRUE is). See this:
 
 http://www.postgresql.org/docs/9.0/interactive/datatype-boolean.html

What are you trying to accomplish? CREATE OR REPLACE FUNCTION trigger_x() 
does not declare any formal-parameters, so calling it with arguments doesn't 
make sense. I'm surprised creating the other triggers didn't produce an error 
stating No function defined with the name trigger_ix and the given 
argument-type.

--
Andreas Joseph Krogh andr...@officenet.no
Senior Software Developer / CTO
Public key: http://home.officenet.no/~andreak/public_key.asc
+-+
OfficeNet AS| The most difficult thing in the world is to |
Rosenholmveien 25   | know how to do a thing and to watch |
1414 Trollåsen  | somebody else doing it wrong, without   |
NORWAY  | comment.|
Org.nr: NO 981 479 076  | |
| |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] potential bug in trigger with boolean params

2011-05-11 Thread Andres Freund

On Wednesday, May 11, 2011 11:01:56 AM Andreas Joseph Krogh wrote:
 På onsdag 11. mai 2011 kl 10:56:19 skrev t...@fuzzy.cz:
   CREATE TRIGGER trig_x_bool BEFORE INSERT ON x FOR EACH ROW EXECUTE
   PROCEDURE
   trigger_x(true);
  The docs clearly state what the valid values are and the literal 'true'
  is not one of them (TRUE is). See this:
  
  http://www.postgresql.org/docs/9.0/interactive/datatype-boolean.html
 
 What are you trying to accomplish? CREATE OR REPLACE FUNCTION trigger_x()
 does not declare any formal-parameters, so calling it with arguments
 doesn't make sense. I'm surprised creating the other triggers didn't
 produce an error stating No function defined with the name trigger_ix and
 the given argument-type.
Read the docs. Parameters for triggers are not passed as normal function 
parameters. Thats why you access them via via TG_ARGV in plpgsql.

The grammar accepts only a very limited amount of parameters there:

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] time-delayed standbys

2011-05-11 Thread Heikki Linnakangas


On 07.05.2011 16:48, Robert Haas wrote:

I was able to reproduce something very like this in unpatched master,
just by letting recovery pause at a named restore point, and then
resuming it.

LOG:  recovery stopping at restore point stop, time 2011-05-07
09:28:01.652958-04
LOG:  recovery has paused
HINT:  Execute pg_xlog_replay_resume() to continue.
(at this point I did pg_xlog_replay_resume())
LOG:  redo done at 0/520
PANIC:  wal receiver still active
LOG:  startup process (PID 38762) was terminated by signal 6: Abort trap
LOG:  terminating any other active server processes

I'm thinking that this code is wrong:

 if (recoveryPauseAtTarget  standbyState ==
STANDBY_SNAPSHOT_READY)
 {
 SetRecoveryPause(true);
 recoveryPausesHere();
 }
 reachedStopPoint = true;/* see below */
 recoveryContinue = false;

I think that recoveryContinue = false assignment should not happen if
we decide to pause.  That is, we should say if (recoveryPauseAtTarget
  standbyState == STANDBY_SNAPSHOT_READY) { same as now } else
recoveryContinue = false.


No, recovery stops at that point whether or not you pause. Resuming 
after stopping at the recovery target doesn't mean that you resume 
recovery, it means that you resume to end recovery and start up the 
server (see the 2nd to last paragraph at 
http://www.postgresql.org/docs/9.1/static/recovery-target-settings.html). It 
would probably be more useful to allow a new stopping target to be set 
and continue recovery, but the current pause/resume functions don't 
allow that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] potential bug in trigger with boolean params

2011-05-11 Thread Andres Freund

On Wednesday, May 11, 2011 11:21:34 AM Andres Freund wrote:
 On Wednesday, May 11, 2011 11:01:56 AM Andreas Joseph Krogh wrote:
  På onsdag 11. mai 2011 kl 10:56:19 skrev t...@fuzzy.cz:
CREATE TRIGGER trig_x_bool BEFORE INSERT ON x FOR EACH ROW EXECUTE
PROCEDURE
trigger_x(true);
   
   The docs clearly state what the valid values are and the literal 'true'
   is not one of them (TRUE is). See this:
   
   http://www.postgresql.org/docs/9.0/interactive/datatype-boolean.html
  
  What are you trying to accomplish? CREATE OR REPLACE FUNCTION
  trigger_x() does not declare any formal-parameters, so calling it with
  arguments doesn't make sense. I'm surprised creating the other triggers
  didn't produce an error stating No function defined with the name
  trigger_ix and the given argument-type.
 
 Read the docs. Parameters for triggers are not passed as normal function
 parameters. Thats why you access them via via TG_ARGV in plpgsql.
 
 The grammar accepts only a very limited amount of parameters there:
Err

TriggerFuncArg:
Iconst
{
char buf[64];
snprintf(buf, sizeof(buf), %d, $1);
$$ = makeString(pstrdup(buf));
}
| FCONST
{ $$ = makeString($1); }
| Sconst
{ $$ = makeString($1); }
| BCONST
{ $$ = makeString($1); }
| XCONST
{ $$ = makeString($1); }
| ColId 
{ $$ = makeString($1); }

That is integers, floating point, strings, bitstrings, hexstrings and column 
references (???).

How that exact list came to exist I do not know.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] potential bug in trigger with boolean params

2011-05-11 Thread Szymon Guz

On 11 May 2011 11:01, Andreas Joseph Krogh andr...@officenet.no wrote:

 På onsdag 11. mai 2011 kl 10:56:19 skrev t...@fuzzy.cz:
   Hi,
   I was trying to create a trigger with parameters. I've found a
 potential
   bug
   when the param is boolean.
  
   Here is code replicating the bug:
  
   CREATE TABLE x(x TEXT);
  
   CREATE OR REPLACE FUNCTION trigger_x() RETURNS TRIGGER AS $$
   BEGIN
   RETURN NEW;
   END; $$ LANGUAGE PLPGSQL;
  
   CREATE TRIGGER trig_x_text BEFORE INSERT ON x FOR EACH ROW EXECUTE
   PROCEDURE
   trigger_x('text');
   CREATE TRIGGER trig_x_int BEFORE INSERT ON x FOR EACH ROW EXECUTE
   PROCEDURE
   trigger_x(10);
   CREATE TRIGGER trig_x_float BEFORE INSERT ON x FOR EACH ROW EXECUTE
   PROCEDURE trigger_x(42.0);
   CREATE TRIGGER trig_x_bool BEFORE INSERT ON x FOR EACH ROW EXECUTE
   PROCEDURE
   trigger_x(true);
  
   ERROR:  syntax error at or near true
   LINE 1: ... INSERT ON x FOR EACH ROW EXECUTE PROCEDURE trigger_x(true);
 
  The docs clearly state what the valid values are and the literal 'true'
 is
  not one of them (TRUE is). See this:
 
  http://www.postgresql.org/docs/9.0/interactive/datatype-boolean.html

 What are you trying to accomplish? CREATE OR REPLACE FUNCTION trigger_x()
 does not declare any formal-parameters, so calling it with arguments doesn't
 make sense. I'm surprised creating the other triggers didn't produce an
 error stating No function defined with the name trigger_ix and the given
 argument-type.


That's how you define trigger function. Later you can use params when
defining trigger.

regards
Szymon

[HACKERS] Re: 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays

2011-05-11 Thread Noah Misch

On Mon, May 09, 2011 at 11:32:28PM -0400, Tom Lane wrote:
 To be concrete, consider the function array_append(anyarray, anyelement)
 yielding anyarray.  Suppose we have a domain D over int[] and the call
 array_append(var_of_type_D, 42).  If we automatically downcast the
 variable to int[], should the result of the function be considered to be
 of type D, or type int[]?  This isn't a trivial distinction because
 choosing to consider it of type D means we have to re-check D's domain
 constraints, which might or might not be satisfied by the modified
 array.  Previous releases considered the result to be of type D,
 *without* rechecking the domain constraints, which was flat out wrong.
 
 So we basically had three alternatives to make it better:
   * downcast to the array type, which would possibly silently
 break applications that were relying on the function result
 being considered of the domain type
   * re-apply domain checks on the function result, which would be
 a performance hit and possibly again result in unobvious
 breakage
   * explicitly break it by throwing a parse error until you
 downcast (and then upcast the function result if you want)
 I realize that #3 is a bit unpleasant, but are either of the other two
 better?  At least #3 shows you where you need to check for problems.

Though I've never used a domain over an array type, I'd strongly prefer #2.  As
best I can tell, the only functions that could need to do this are those that
have an anyarray argument and also an anyarray return type.  In core, that's
array_append, array_prepend, array_cat, array_larger, array_smaller, max, and
min.  The last four always return an input unchanged, so only the first three
would actually check anything.  I'm not seeing, offhand, any need to add new
validation to PLs.

The parse-time breakage of #3 is a nice tool during your upgrade QA, but the
long-term semantics are considerably worse for wear.

I see nothing to recommend #1.

 There is another issue that wasn't really mentioned in the previous
 thread, which is that if we are matching a domain-over-array to a
 function's ANYARRAY argument, what exactly should be allowed to match to
 ANYELEMENT --- or if the function returns ANYELEMENT, what should the
 imputed result type be?

What else but the type seen by subscripting a datum of the ANYARRAY type?

 AFAICS it's impossible to give an answer to
 that without effectively deciding that function argument matching
 smashes the domain to its base type (the array type).

The domain and its base type share an element type, but how does that
necessitate any further removal of domain-ness?

 It's not very
 clear what's the point of a domain type if every operation on it is
 going to neglect its domain-ness.

Agreed.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Nicolas Barbier

2011/5/11, Bruce Momjian br...@momjian.us:

 FYI, because the visibility map is only one _bit_ per page, it is 8000 *
 8 or 64k times smaller than the heap, e.g. one 8k page covers 64MB of
 heap pages.

Actually, that would be one 8kB block covers 512MB of heap: 1 block
of visibility map (8kB) = 64k visibility bits = covers 64k blocks =
covers 512MB of heap. The cost of keeping the visibility map in cache
is therefore totally negligible, only the cost of WAL logging changes
to it is of interest.

 This is important because we rely on this compactness in hope that
 the WAL logging of this information will not be burdensome.

The size of on entry in the map (1 bit) is not very related to the WAL
overhead required per change of such a bit (i.e., the log record for a
1 bit change will certainly be way more than 1 bit).

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] potential bug in trigger with boolean params

2011-05-11 Thread Andreas Joseph Krogh

P onsdag 11. mai 2011 kl 11:30:51 skrev Szymon Guz mabew...@gmail.com:
 

On 11 May 2011 11:01, Andreas Joseph Krogh andr...@officenet.no wrote:
 P onsdag 11. mai 2011 kl 10:56:19 skrev t...@fuzzy.cz:


  Hi,
  I was trying to create a trigger with parameters. I've found a potential
  bug
  when the param is boolean.
 
  Here is code replicating the bug:
 
  CREATE TABLE x(x TEXT);
 
  CREATE OR REPLACE FUNCTION trigger_x() RETURNS TRIGGER AS $$
  BEGIN
  RETURN NEW;
  END; $$ LANGUAGE PLPGSQL;
 
  CREATE TRIGGER trig_x_text BEFORE INSERT ON x FOR EACH ROW EXECUTE
  PROCEDURE
  trigger_x('text');
  CREATE TRIGGER trig_x_int BEFORE INSERT ON x FOR EACH ROW EXECUTE
  PROCEDURE
  trigger_x(10);
  CREATE TRIGGER trig_x_float BEFORE INSERT ON x FOR EACH ROW EXECUTE
  PROCEDURE trigger_x(42.0);
  CREATE TRIGGER trig_x_bool BEFORE INSERT ON x FOR EACH ROW EXECUTE
  PROCEDURE
  trigger_x(true);
 
  ERROR: syntax error at or near true
  LINE 1: ... INSERT ON x FOR EACH ROW EXECUTE PROCEDURE trigger_x(true);

 The docs clearly state what the valid values are and the literal 'true' is
 not one of them (TRUE is). See this:

 http://www.postgresql.org/docs/9.0/interactive/datatype-boolean.html


What are you trying to accomplish? CREATE OR REPLACE FUNCTION trigger_x() does not declare any formal-parameters, so calling it with arguments doesn't make sense. I'm surprised creating the other triggers didn't produce an error stating No function defined with the name trigger_ix and the given argument-type.



That's how you define trigger function. Later you can use params when defining trigger.


Pardon my ignorance:-)

--
Andreas Joseph Krogh andr...@officenet.no
Senior Software Developer / CTO
Public key: http://home.officenet.no/~andreak/public_key.asc
+-+
OfficeNet AS  | The most difficult thing in the world is to |
Rosenholmveien 25   | know how to do a thing and to watch|
1414 Trollsen | somebody else doing it wrong, without   |
NORWAY | comment.  |
Org.nr: NO 981 479 076 |  |
|  |
Tlf:  +47 24 15 38 90 |  |
Fax:  +47 24 15 38 91 |  |
Mobile: +47 909 56 963 |  |
+-+

Re: [HACKERS] time-delayed standbys

2011-05-11 Thread Heikki Linnakangas


On 11.05.2011 08:29, Fujii Masao wrote:

On Sat, May 7, 2011 at 10:48 PM, Robert Haasrobertmh...@gmail.com  wrote:

I was able to reproduce something very like this in unpatched master,
just by letting recovery pause at a named restore point, and then
resuming it.


I was able to reproduce the same problem even in 9.0. When the standby
reaches the recovery target, it always tries to end the recovery even
though walreceiver is still running, which causes the problem. This seems
to be an oversight in streaming replication. I should have considered how
the standby should work when recovery_target is specified.

What about the attached patch? Which stops walreceiver instead of
emitting PANIC there only if we've reached the recovery target.


I think we can just always call ShutdownWalRcv(). It should be gone if 
the server was promoted while streaming, but that's just an 
implementation detail of what the promotion code does. There's no hard 
reason why it shouldn't be running at that point anymore, as long as we 
kill it before going any further.


Committed a patch to do that.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Cédric Villemain

2011/5/11 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 On 10.05.2011 20:15, Simon Riggs wrote:

 On Tue, May 10, 2011 at 5:17 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov  wrote:

 Simon Riggssi...@2ndquadrant.com  wrote:

 This topic has been discussed many times, yet I have never seen an
 assessment that explains WHY we would want to do index-only scans.

 In databases with this feature, it's not too unusual for a query
 which uses just an index to run one or more orders of magnitude
 faster than a query which has to randomly access the heap for each
 index entry.  That seems like enough evidence of its possible value
 in PostgreSQL to proceed to the point where benchmarks become
 possible.  I'm assuming that, like all other features added as
 performance optimizations, it won't be committed until there are
 benchmarks showing the net benefit.

 As a thought experiment, picture the relative costs of scanning a
 portion of an index in index sequence, and being done, versus
 scanning a portion of an index in index sequence and jumping to a
 random heap access for each index entry as you go.

 I can picture that. Regrettably, I can also picture the accesses to
 the visibility map, the maintenance operations on the VM that are
 needed for this and the contention that both of those will cause.

 Note that we already have the visibility map, and the accesses needed to
 update it are already there. Granted, we'll have to change the logic
 slightly to make it crash safe, but I don't expect that to add any
 meaningful overhead - the changes are going to be where the bits are set,
 ie. vacuum, not when the bits are cleared. Granted, we might also want to
 set the bits more aggressively once they're used by index-only-scans. But
 done correctly, just taking advantage of the VM that's already there
 shouldn't add overhead to other operations.

We won't be able to do index-only scan.
We can do index scan with probability to not access heap,
maybe(hopefully) completely in some cases.

IF vis map is ok to remove the need to access heap (perf and safe),
then, for the cost part:
Currently, cost_index materialize the cost to access each heap page by
a random_page_cost. I believe we should be able to change that to
remove the estimated number of heap page we don't need to access (can
be 100% or 0.1%).

And as suggested Simon, there is also maybe a path to improve the
bitmapheap scan. bitmapheap scan have already some workaround to be
sure indexscan looks cheaper in some case, just keep that and apply
same logic than for cost_index.

This is keeping the same rule PostgreSQL has : let the planner decide
the best solution instead of allowing special index declaration (it
hasn't been propose yet I think, but, well, just in case it pops into
the mind of someone)


 I agree that we need to do tests to demonstrate that there's a gain from the
 patch, once we have a patch to test. I would be very surprised if there
 isn't, but that just means the testing is going to be easy.

 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] potential bug in trigger with boolean params

2011-05-11 Thread Andres Freund

On Wednesday, May 11, 2011 11:50:35 AM Szymon Guz wrote:
 On 11 May 2011 11:29, Andres Freund and...@anarazel.de wrote:
  On Wednesday, May 11, 2011 11:21:34 AM Andres Freund wrote:
   On Wednesday, May 11, 2011 11:01:56 AM Andreas Joseph Krogh wrote:
På onsdag 11. mai 2011 kl 10:56:19 skrev t...@fuzzy.cz:
  CREATE TRIGGER trig_x_bool BEFORE INSERT ON x FOR EACH ROW
  EXECUTE PROCEDURE
  trigger_x(true);
 
 The docs clearly state what the valid values are and the literal
  
  'true'
  
 is not one of them (TRUE is). See this:
 
 http://www.postgresql.org/docs/9.0/interactive/datatype-boolean.htm
 l

What are you trying to accomplish? CREATE OR REPLACE FUNCTION
trigger_x() does not declare any formal-parameters, so calling it
with arguments doesn't make sense. I'm surprised creating the other
triggers didn't produce an error stating No function defined with
the name trigger_ix and the given argument-type.
   
   Read the docs. Parameters for triggers are not passed as normal
   function parameters. Thats why you access them via via TG_ARGV in
   plpgsql.
  
   The grammar accepts only a very limited amount of parameters there:
  Err
  
  TriggerFuncArg:
 Iconst
 
 {
 
 char buf[64];
 snprintf(buf, sizeof(buf), %d,
  
  $1);
  
 $$ = makeString(pstrdup(buf));
 
 }
 | 
 | FCONST
 
 { $$ = makeString($1); }
 
 | Sconst
 
 { $$ = makeString($1); }
 
 | BCONST
 
 { $$ = makeString($1); }
 
 | XCONST
 
 { $$ = makeString($1); }
 
 | ColId
  
  { $$ = makeString($1); }
  
  That is integers, floating point, strings, bitstrings, hexstrings and
  column references (???).
  
  How that exact list came to exist I do not know.
 
 My two thoughts on that:
 
 1. This list should be improved to allow booleans, and maybe other types
 
 2. Why then is it like this:
 
 it works:
 CREATE TRIGGER trig_x_10 BEFORE INSERT ON x FOR EACH ROW EXECUTE PROCEDURE
 trigger_x('true');
 
 it does not:
 CREATE TRIGGER trig_x_11 BEFORE INSERT ON x FOR EACH ROW EXECUTE PROCEDURE
 trigger_x('true'::text);
 
 the error is:
 ERROR:  syntax error at or near ::
 
 I think there is something wrong.
The grammar doesn't allow any form of expression. It only allows the above 
listed types of literals directly.
I am not really sure why it was done that way, but its been that way for a 
long time (only insignificant changes since 1997... bitstrings and hex strings 
were added after that though).

Why do you wan't to use a boolean directly if you can't use it as the type 
itself anyway?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Cédric Villemain

2011/5/10 Kevin Grittner kevin.gritt...@wicourts.gov:
 Simon Riggs si...@2ndquadrant.com wrote:
 The typical speed up for non-covered indexes will come when we
 access a very large table (not in cache) via an index scan that is
 smaller than a bitmapindex scan. Will we be able to gauge
 selectivities sufficiently accurately to be able to pinpoint that
 during optimization? How will we know that the table is not in
 cache? Or is this an optimisation in the executor for a bitmapheap
 scan?

 I would continue to object to using current cache contents for plan
 choice because of plan instability and the fact that an odd initial
 cache load could skew plans in a bad direction indefinitely.  I do
 agree (and have already posted) that I think the hardest part of
 this might be developing a good cost model.  I doubt that's an
 insoluble problem, especially since it is something we can refine
 over time as we gain experience with the edge cases.

you will have the same possible instability in planning with the
index(-only?) scan because we may need to access heap anyway and this
needs is based on estimation, or I miss something ? I understood the
idea was just to bypass the heap access *if* we can for *this*
heap-page.

In reality, I am not really scared by plan instability because of a
possible PG/OS cache estimation. The percentages remain stable in my
observations ... I don't know yet how it will go for vis map.

And, we already have plan instability currently, which is *good* : at
some point a seq scan is better than an bitmap heap scan. Because the
relation size change and because ANALYZE re-estimate the distribution
of the data. I will be very happy to issue ANALYZE CACHE as I have to
ANALYZE temp table for large query if it allows the planner to provide
me the best plan in some scenario...but this is another topic, sorry
for the digression..

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] potential bug in trigger with boolean params

2011-05-11 Thread Szymon Guz

On 11 May 2011 12:06, Andres Freund and...@anarazel.de wrote:


 Why do you wan't to use a boolean directly if you can't use it as the type
 itself anyway?


Yep, and this is a really good point :)
I wanted to have consistent api, so use true when I have a boolean value.
I will use 'true' and add some info on that to the procedure documentation.

regards
Szymon

Re: [HACKERS] Process wakeups when idle and power consumption

2011-05-11 Thread Peter Geoghegan

On 11 May 2011 09:54, Magnus Hagander mag...@hagander.net wrote:

 If you're doing this Win32 specific, take a look at
 src/backend/port/win32/signal.c for an example.

 If you're not doing this win32-specific, I doubt we really want
 threads to be involved...

Well, that seems to be the traditional wisdom. It seems sensible to me
that each process should look out for postmaster death itself though.
Tom described potential race conditions in looking at ps output...do
we really want to double the number of auxiliary processes in a single
release of Postgres?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Process wakeups when idle and power consumption

2011-05-11 Thread Heikki Linnakangas


On 11.05.2011 13:34, Peter Geoghegan wrote:

On 11 May 2011 09:54, Magnus Hagandermag...@hagander.net  wrote:


If you're doing this Win32 specific, take a look at
src/backend/port/win32/signal.c for an example.

If you're not doing this win32-specific, I doubt we really want
threads to be involved...


Well, that seems to be the traditional wisdom. It seems sensible to me
that each process should look out for postmaster death itself though.
Tom described potential race conditions in looking at ps output...do
we really want to double the number of auxiliary processes in a single
release of Postgres?


Uh, no you don't want any new processes on Unix. You want each process 
to check for postmaster death every once in a while, like they do today. 
The pipe-trick is to make sure the processes wake up promptly to notice 
the death when the postmaster dies. You just need to add the 
postmaster-pipe to the select() calls we already do.


I'm not sure if on Windows you can similarly just add to the 
postmaster-pipe to the WaitForMultipleObjects() calls we already do. 
Then you won't need new threads on Windows either.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Robert Haas

On Tue, May 10, 2011 at 9:34 PM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 So, what do we need in order to find our way to index-only scans?

 1. The visibility map needs to be crash-safe.  The basic idea of
 index-only scans is that, instead of checking the heap to find out
 whether each tuple is visible, we first check the visibility map.  If
 the visibility map bit is set, then we know all tuples on the page are
 visible to all transactions, and therefore the tuple of interest is
 visible to our transaction.  Assuming that a significant number of
 visibility map bits are set, this should enable us to avoid a fair
 amount of I/O, especially on large tables, because the visibility map
 is roughly 8000 times smaller than the heap, and therefore far more
 practical to keep in cache.  However, before we can rely on the

 FYI, because the visibility map is only one _bit_ per page, it is 8000 *
 8 or 64k times smaller than the heap, e.g. one 8k page covers 64MB of
 heap pages.  This is important because we rely on this compactness in
 hope that the WAL logging of this information will not be burdensome.

I accuse you of bad math.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Robert Haas

On Tue, May 10, 2011 at 10:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 That will be true only if you intentionally ignore the points Greg
 raised.  If the table isn't entirely ALL_VISIBLE, then the choice of
 index will determine the ordering of the actual table probes that occur.
 There could be more or fewer page reads, in a more or less optimal
 order, depending on the index used.

However, note that this wasn't one of the cases I said I was going to
try to optimize in the first go-around anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] time-delayed standbys

2011-05-11 Thread Fujii Masao

On Wed, May 11, 2011 at 6:50 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I think we can just always call ShutdownWalRcv(). It should be gone if the
 server was promoted while streaming, but that's just an implementation
 detail of what the promotion code does. There's no hard reason why it
 shouldn't be running at that point anymore, as long as we kill it before
 going any further.

Okay. But I'd like to add the following assertion check just before
ShutdownWalRcv() which you added, in order to detect such a bug
that we found this time, i.e., the bug which causes unexpected end
of recovery. Thought?

Assert(reachedStopPoint || !WalRcvInProgress())

 Committed a patch to do that.

Thanks. Should we backport it to 9.0? 9.0 has the same problem.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Robert Haas

On Wed, May 11, 2011 at 3:17 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Completely agree, but why are you saying that to me?

 When Tom asks me why I suggest something, nobody tells him its a free
 software project etc

 What is the difference here?

We're now 40 emails in this thread, and there seems to be far more
heat than light here.  Here's an attempt at a summary:

- Simon wants proof that the performance benefit of this feature is
worth any downsides it may have, which is standard procedure, and
isn't certain the feature will have a significant performance benefit.
- Numerous other people think Simon's doubts about the feature are
poorly justified (and some of them also think he's being a pain in the
neck).
- Various peripherally related topics, such as optimizing count(*),
which is not part of the vision for the first go-round that I sketched
out in my OP, and plan stability, which is another issue entirely,
have been discussed.
- Meanwhile, only one person has done any review of the actual code
that's been written, which is posted on the crash-safe visibility map
thread, which may be why multiple people seem confused about what it
does.
- And no one has done any benchmarking of that code.

I think it would be really helpful if some more people would review
the crash-safe visibility map patch, and if at least one person could
benchmark it.  It would be useful to know (a) whether that noticeably
slows down the system during inserts, updates, and deletes, especially
at very high concurrency; and (b) how much of an impact the additional
WAL-logging has on VACUUM.  On the other hand, arguing about whether
index-only scans are going to result in a significant performance
benefit is not useful.  I am going to be both surprised and
disappointed if they don't, but there's only one way to find out, and
a theoretical argument isn't it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Feature proposal: distinguish each PostgreSQL instance in the event log

2011-05-11 Thread MauMau


From: Dave Page dp...@pgadmin.org

On Tue, May 10, 2011 at 11:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:

BTW, what will this accomplish exactly that couldn't be accomplished by
setting log_line_prefix to include the desired identifier?


Windows uses the event source field to show where events in the logs
have come from. The Event Viewer and other network management tools
expect to be able to use the field for sorting and filtering etc, so
having a different value for different PG instances allows those tools
to distinguish between them properly. It's also useful for ISVs who
may want to make an embedded PG instance used by their apps identify
itself differently from other instances.


Thank you for help, Dave.

The Windows Event Viewer can filter event log records with event source, 
computer, user, date/time, etc. but not with message body. So 
log_line_prefix cannot be used for filtering. Even if filtering with message 
body were possible, using event source field to filter applications is more 
natural and intuitive.


I would appreciate comments from all on the following items mentioned 
previously. If no comments, I'll do as follows and try to submit the patch a 
week later or so if I have enough time.


1. Documentation
Move the event log registration/deregistion article to the following 
section. Add a link to this article from log_destination description.


16.1.4. Cleaning and installing
http://www.postgresql.org/docs/9.0/static/install-windows-full.html

2. pg_ctl's event logging
Do not change anything - that is, use the fixed event source PostgreSQL.
How do we explain the reason for not using event_source parameter in 
postgresql.conf? postgresql.conf is for the server. pg_ctl uses the fixed 
event source. This is a specification. This is not a problem because 
starting/stopping the database/application is infrequent once the system is 
in steady operation. -- this may sound abrupt, though.


Regards,
MauMau

- Original Message - 
From: Dave Page dp...@pgadmin.org

To: Tom Lane t...@sss.pgh.pa.us
Cc: MauMau maumau...@gmail.com; pgsql-hackers@postgresql.org
Sent: Wednesday, May 11, 2011 4:58 PM
Subject: Re: [HACKERS] Feature proposal: distinguish each PostgreSQL 
instance in the event log




On Tue, May 10, 2011 at 11:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:

MauMau maumau...@gmail.com writes:

MauMau maumau...@gmail.com writes:
I've encountered one problem on Windows. I need to support running all 
of

my
products on one host simultaneously. Plus, I need to log messages in
syslog/event log. On Linux, I can distinguish the messages of one 
product
and those of other products by setting syslog_ident in 
postgresql.conf.

On
the other hand, I can't do that on Windows: all of the PostgreSQL
instances
use the same fixed event source name PostgreSQL.


SO I'd like to propose a trivial feature which allows users to set 
event

source.


BTW, what will this accomplish exactly that couldn't be accomplished by
setting log_line_prefix to include the desired identifier?


Windows uses the event source field to show where events in the logs
have come from. The Event Viewer and other network management tools
expect to be able to use the field for sorting and filtering etc, so
having a different value for different PG instances allows those tools
to distinguish between them properly. It's also useful for ISVs who
may want to make an embedded PG instance used by their apps identify
itself differently from other instances.


--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] time-delayed standbys

2011-05-11 Thread Heikki Linnakangas


On 11.05.2011 14:16, Fujii Masao wrote:

On Wed, May 11, 2011 at 6:50 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

I think we can just always call ShutdownWalRcv(). It should be gone if the
server was promoted while streaming, but that's just an implementation
detail of what the promotion code does. There's no hard reason why it
shouldn't be running at that point anymore, as long as we kill it before
going any further.


Okay. But I'd like to add the following assertion check just before
ShutdownWalRcv() which you added, in order to detect such a bug
that we found this time, i.e., the bug which causes unexpected end
of recovery. Thought?

 Assert(reachedStopPoint || !WalRcvInProgress())


There's no unexpected end of recovery here. The recovery ends when we 
reach the target, as it should. It was the assumption that WAL receiver 
can't be running at that point anymore that was wrong.


That assertion would work, AFAICS, but I don't think it's something we 
need to assert. There isn't any harm done if WAL receiver is still 
running, as long as we shut it down at that point.



Committed a patch to do that.


Thanks. Should we backport it to 9.0? 9.0 has the same problem.


Ah, thanks, missed that, Cherry-picked to 9.0 now as well.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Robert Haas

On Tue, May 10, 2011 at 10:29 PM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 It seems to me a reasonable way to implement VARIANT would be to have
 a data type called VARIANT that stores an OID of the inner type at the
 beginning, followed by the binary data.

That's likely to be how it gets implemented, but you seem to have
missed the point of some of the discussion upthread: the big problem
with that is that someone might type DROP TYPE foo, and when they
do, you need an efficient way to figure out whether foo is in use
inside an instance of the variant type anywhere in the system.  The
devil is in the details...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch to allow domains over composite types

2011-05-11 Thread Yeb Havinga


typecmds.c says:
Domains over composite types might be made to work in the future, but 
not today.


Attached is a patch that allows domains over composite types, together 
with test cases in domaincomp.sql. A domain over a composite type has 
typtype TYPTYPE_DOMAIN, but typrelid and typrelkind are empty: that 
information is only available in the pg_type record of the base type. 
The remainder of the patch follows from that choice. While parsing a 
record expression into a row type, an extra coercion node had to be 
inserted to ensure that the domain checks are called.


All regression tests are ok, comments are highly appreciated.

--

Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
new file mode 100644
index 4f4dd69..1c8dc64
*** a/src/backend/access/heap/tuptoaster.c
--- b/src/backend/access/heap/tuptoaster.c
***
*** 36,41 
--- 36,42 
  #include access/xact.h
  #include catalog/catalog.h
  #include utils/fmgroids.h
+ #include utils/lsyscache.h
  #include utils/pg_lzcompress.h
  #include utils/rel.h
  #include utils/typcache.h
*** toast_flatten_tuple_attribute(Datum valu
*** 965,971 
  	 * Break down the tuple into fields.
  	 */
  	olddata = DatumGetHeapTupleHeader(value);
! 	Assert(typeId == HeapTupleHeaderGetTypeId(olddata));
  	Assert(typeMod == HeapTupleHeaderGetTypMod(olddata));
  	/* Build a temporary HeapTuple control structure */
  	tmptup.t_len = HeapTupleHeaderGetDatumLength(olddata);
--- 966,973 
  	 * Break down the tuple into fields.
  	 */
  	olddata = DatumGetHeapTupleHeader(value);
! 	Assert((typeId == HeapTupleHeaderGetTypeId(olddata)
! 			|| (getBaseType(typeId) == HeapTupleHeaderGetTypeId(olddata;
  	Assert(typeMod == HeapTupleHeaderGetTypMod(olddata));
  	/* Build a temporary HeapTuple control structure */
  	tmptup.t_len = HeapTupleHeaderGetDatumLength(olddata);
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
new file mode 100644
index f8eb5bc..c9aef53
*** a/src/backend/commands/typecmds.c
--- b/src/backend/commands/typecmds.c
*** DefineDomain(CreateDomainStmt *stmt)
*** 814,827 
  	basetypeoid = HeapTupleGetOid(typeTup);
  
  	/*
! 	 * Base type must be a plain base type, another domain or an enum. Domains
! 	 * over pseudotypes would create a security hole.  Domains over composite
! 	 * types might be made to work in the future, but not today.
  	 */
  	typtype = baseType-typtype;
  	if (typtype != TYPTYPE_BASE 
  		typtype != TYPTYPE_DOMAIN 
! 		typtype != TYPTYPE_ENUM)
  		ereport(ERROR,
  (errcode(ERRCODE_DATATYPE_MISMATCH),
   errmsg(\%s\ is not a valid base type for a domain,
--- 814,827 
  	basetypeoid = HeapTupleGetOid(typeTup);
  
  	/*
! 	 * Base type must be a plain base type, composite type, another domain or
! 	 * an enum. Domains over pseudotypes would create a security hole.
  	 */
  	typtype = baseType-typtype;
  	if (typtype != TYPTYPE_BASE 
  		typtype != TYPTYPE_DOMAIN 
! 		typtype != TYPTYPE_ENUM 
! 		typtype != TYPTYPE_COMPOSITE)
  		ereport(ERROR,
  (errcode(ERRCODE_DATATYPE_MISMATCH),
   errmsg(\%s\ is not a valid base type for a domain,
diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c
new file mode 100644
index 0418972..39a1835
*** a/src/backend/parser/parse_coerce.c
--- b/src/backend/parser/parse_coerce.c
*** coerce_type(ParseState *pstate, Node *no
*** 369,382 
  		return result;
  	}
  	if (inputTypeId == RECORDOID 
! 		ISCOMPLEX(targetTypeId))
  	{
  		/* Coerce a RECORD to a specific complex type */
  		return coerce_record_to_complex(pstate, node, targetTypeId,
  		ccontext, cformat, location);
  	}
  	if (targetTypeId == RECORDOID 
! 		ISCOMPLEX(inputTypeId))
  	{
  		/* Coerce a specific complex type to RECORD */
  		/* NB: we do NOT want a RelabelType here */
--- 369,382 
  		return result;
  	}
  	if (inputTypeId == RECORDOID 
! 		isComplex(targetTypeId))
  	{
  		/* Coerce a RECORD to a specific complex type */
  		return coerce_record_to_complex(pstate, node, targetTypeId,
  		ccontext, cformat, location);
  	}
  	if (targetTypeId == RECORDOID 
! 		isComplex(inputTypeId))
  	{
  		/* Coerce a specific complex type to RECORD */
  		/* NB: we do NOT want a RelabelType here */
*** can_coerce_type(int nargs, Oid *input_ty
*** 478,491 
  		 * coerce (may need tighter checking here)
  		 */
  		if (inputTypeId == RECORDOID 
! 			ISCOMPLEX(targetTypeId))
  			continue;
  
  		/*
  		 * If input is a composite type and target is RECORD, accept
  		 */
  		if (targetTypeId == RECORDOID 
! 			ISCOMPLEX(inputTypeId))
  			continue;
  
  #ifdef NOT_USED	/* not implemented yet */
--- 478,491 
  		 * coerce (may need tighter checking here)
  		 */
  		if (inputTypeId == RECORDOID 
! 			isComplex(targetTypeId))
  			continue;
  
  		/*
  		 * If input is a

Re: [HACKERS] crash-safe visibility map, take five

2011-05-11 Thread Pavan Deolasee

On Tue, May 10, 2011 at 7:38 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, May 10, 2011 at 9:45 AM, Merlin Moncure mmonc...@gmail.com
 wrote:
  I see: here's a comment that was throwing me off:
  +   /*
  +* If we didn't get the lock and it turns out we need it, we'll
 have to
  +* unlock and re-lock, to avoid holding the buffer lock across an
 I/O.
  +* That's a bit unfortunate, but hopefully shouldn't happen
 often.
  +*/
 
  I think that might be phrased as didn't get the pin and it turns out
  we need it because the bit can change after inspection.  The visible
  bit isn't 'wrong' as suggested in the comments, it just can change so
  that it becomes wrong.  Maybe a note of why it could change would be
  helpful.

 Oh, I see.  I did write lock when I meant pin, and your other
 point is well-taken as well.  Here's a revised version with some
 additional wordsmithing.


Some trivial comments.

Why do the set and clear functions need pass-by-reference (Buffer *)
argument ? I don't see them modifying the argument at all. This patch adds
the clear function, but the existing set function also suffers from that.

There are several invocations of pin/clear/release combos. May be you would
want a convenience routine for doing that in a single step or just pass
InvalidBuffer to clear() in which case, it would assume that the vm buffer
is not pinned and do the needful.

The comment at the top of visibilitymap_pin_ok  says On entry, *buf, but
the function really takes just a buf. You can possibly fold
visibilitymap_pin_ok() into a macro (and also name it slightly differently
like visibilitymap_is_pinned ?).

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Andrew Dunstan




On 05/11/2011 07:53 AM, Robert Haas wrote:

On Tue, May 10, 2011 at 10:29 PM, Joseph Adams
joeyadams3.14...@gmail.com  wrote:

It seems to me a reasonable way to implement VARIANT would be to have
a data type called VARIANT that stores an OID of the inner type at the
beginning, followed by the binary data.

That's likely to be how it gets implemented, but you seem to have
missed the point of some of the discussion upthread: the big problem
with that is that someone might type DROP TYPE foo, and when they
do, you need an efficient way to figure out whether foo is in use
inside an instance of the variant type anywhere in the system.  The
devil is in the details...



Actually, I thought that was the variant (pun intended) Tom objected to, 
and wanted declared variant types instead, precisely so we could know 
where we'd need to look if a type was dropped. Using that you might 
still use the type oid at the beginning of the field, but there are 
other possibilities too (e.g. a single byte indicating which known 
variant is in use).


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Bruce Momjian

Nicolas Barbier wrote:
 2011/5/11, Bruce Momjian br...@momjian.us:
 
  FYI, because the visibility map is only one _bit_ per page, it is 8000 *
  8 or 64k times smaller than the heap, e.g. one 8k page covers 64MB of
  heap pages.
 
 Actually, that would be one 8kB block covers 512MB of heap: 1 block
 of visibility map (8kB) = 64k visibility bits = covers 64k blocks =
 covers 512MB of heap. The cost of keeping the visibility map in cache
 is therefore totally negligible, only the cost of WAL logging changes
 to it is of interest.

Ah, yes, thanks, even better.

  This is important because we rely on this compactness in hope that
  the WAL logging of this information will not be burdensome.
 
 The size of on entry in the map (1 bit) is not very related to the WAL
 overhead required per change of such a bit (i.e., the log record for a
 1 bit change will certainly be way more than 1 bit).

True.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Bruce Momjian

Simon Riggs wrote:
 On Wed, May 11, 2011 at 1:47 AM, Bruce Momjian br...@momjian.us wrote:
  Greg Stark wrote:
  On a separate note though, Simon, I don't know what you mean by we
  normally start with a problem. It's an free software project and
  people are free to work on whatever interests them whether that's
  because it solves a problem they have, helps a client who's paying
  them, or just because it's of academic interest to them. We don't
  always take their patches if they aren't of general interest but
  people propose all kinds of crazy experimental ideas all the time.
 
  I am confused by Simon's questions too.
 
  Simon seems to regularly argue for adding features late in the
  development cycle and backpatch things no one else thinks should be
  backpatched, but he wants more research that index-only scans are going
  to improve things before it is implemented? ? The first is aggressive
  development, the second is very conservative development --- they don't
  match, so I now wonder what the motivation is since it isn't consistent.
 
 Not really sure why reasonable technical skepticism should become
 personal commentary.
 
 You don't question Tom's motives if he is skeptical of an idea of
 mine. Why would you question my motivation? What is *your* motive for
 acting like that?

Tom is consistent in his level of aggressive/conservative development
suggestions.  What I am seeing are many cases where you are consistently
pushing for something even though you get almost-overwhelming rejection,
and you keep going.  And if it was consistent in one direction, I could
understand because maybe you feel we are too conservative, but if it
isn't consistent, I have no idea how to learn or adjust to your
approach.  We clearly have some people on one side of the
conservative/agressive specturm, and some on the other side.

Now, I am willing to admit I might be totally wrong, but it has risen to
a level that I felt I should say something in case it is helpful.
 
 I'm not driven by one setting of conservatism, but I am interested
 in adding fully usable features that bring credit to the project. If I
 see a feature that can have minor things added to it to improve them,
 then I raise that during beta. If I see things being worked out that
 sounds dubious, I mention that in early development.

Yes, that seems fine to me, as stated.

 I don't think this work will materially improve the speed of count(*)
 in majority of cases. This introduces extra overhead into the code

I think this is the only hope we have of improving count(*) in an active
MVCC system.  It might not work, but it has been our only hope of
improvement of count(*) for a while.

 path and that can be a net loss. The only time it will help is when
 you have a large table that is not cached and also not recently
 updated. Is count(*) run very often against such tables? Do we really
 care enough to optimise that use case with lots of special purpose
 code? The very fact that Kevin and yourself bring up different reasons
 for why we need this feature makes me nervous.

Yes, no question.  For count(*), you don't care about the indexed
values, only the count, while for Kevin's case you are reading values
from the index.  I assume (or hope) that one or both will be a win for
this feature.

 The analysis has not been done yet, and all I have done is request that.

I think we are going to have to write the code and see the performance
hit and where it is a win.  Ideally we could figure this out
before-hand, but I don't think that is possible in this case.  If you
look at the research in reducing the load of updating the hint bits,
again, it is so complex that only working code and testing is showing if
there is possible improvement there.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays

2011-05-11 Thread Tom Lane

Noah Misch n...@leadboat.com writes:
 On Mon, May 09, 2011 at 11:32:28PM -0400, Tom Lane wrote:
 So we basically had three alternatives to make it better:
  * downcast to the array type, which would possibly silently
break applications that were relying on the function result
being considered of the domain type
  * re-apply domain checks on the function result, which would be
a performance hit and possibly again result in unobvious
breakage
  * explicitly break it by throwing a parse error until you
downcast (and then upcast the function result if you want)
 I realize that #3 is a bit unpleasant, but are either of the other two
 better?  At least #3 shows you where you need to check for problems.

 Though I've never used a domain over an array type, I'd strongly prefer #2.

Hmm.  I hadn't seriously considered that alternative, but we could go in
that direction.  Logically, what this would probably imply is inserting
CastToDomain whenever the result of a polymorphic function is deemed to
be of a domain type, whether the base type is array or not.

The reason I hadn't taken it very seriously is that I don't think it's
actually going to end up being consistent.  If we don't do #1 (downcast
polymorphic arguments to a base type), but consider the arguments passed
to the function to be of the domain type, then really we have to expect
the polymorphic function to enforce domain constraints internally; we
cannot fix it with something as localized as having the function call
parser stick a CastToDomain on top.  Here's a possibly rather silly
example:

create function negate(anyelement) returns anyelement as
$$ select - $1 $$ language sql;

create domain pos as int check (value  0);

select negate(42::pos);

This negate() function will work for any type that has a unary minus
operator.  But the result of the unary minus operation cannot sanely be
considered to be of this domain type.  In this simplified example you
might feel it doesn't matter, since with an external CastToDomain we'd
throw error anyway a moment later, as soon as control comes back from
the function.  But what if the function does further operations with the
value, such as passing it to another polymorphic function?

So really, if you go down this path, you end up concluding that PLs
supporting polymorphic arguments had better be prepared to enforce
domain constraints all the way through, and thus there should be no need
for an external CastToDomain --- what comes back from the function ought
to be checked already.  Unfortunately, even if the PLs do that (SQL
functions might get it right, but I'm not real sure whether plpgsql is
water-tight on this, and I don't trust the other PLs for it at all),
there's no way that built-in polymorphic functions like array_append are
going to.

So on the whole, #2 looks like an implementation quagmire to me: it's
not clear what to check, or where, or how you know when you're done.
I'm not willing to volunteer my own time to make it work that way.
If somebody else who uses domains a lot wants to step up and take
responsibility, go for it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Bruce Momjian

C?dric Villemain wrote:
 2011/5/10 Kevin Grittner kevin.gritt...@wicourts.gov:
  Simon Riggs si...@2ndquadrant.com wrote:
  The typical speed up for non-covered indexes will come when we
  access a very large table (not in cache) via an index scan that is
  smaller than a bitmapindex scan. Will we be able to gauge
  selectivities sufficiently accurately to be able to pinpoint that
  during optimization? How will we know that the table is not in
  cache? Or is this an optimisation in the executor for a bitmapheap
  scan?
 
  I would continue to object to using current cache contents for plan
  choice because of plan instability and the fact that an odd initial
  cache load could skew plans in a bad direction indefinitely. ?I do
  agree (and have already posted) that I think the hardest part of
  this might be developing a good cost model. ?I doubt that's an
  insoluble problem, especially since it is something we can refine
  over time as we gain experience with the edge cases.
 
 you will have the same possible instability in planning with the
 index(-only?) scan because we may need to access heap anyway and this
 needs is based on estimation, or I miss something ? I understood the
 idea was just to bypass the heap access *if* we can for *this*
 heap-page.
 
 In reality, I am not really scared by plan instability because of a
 possible PG/OS cache estimation. The percentages remain stable in my
 observations ... I don't know yet how it will go for vis map.
 
 And, we already have plan instability currently, which is *good* : at
 some point a seq scan is better than an bitmap heap scan. Because the
 relation size change and because ANALYZE re-estimate the distribution
 of the data. I will be very happy to issue ANALYZE CACHE as I have to
 ANALYZE temp table for large query if it allows the planner to provide
 me the best plan in some scenario...but this is another topic, sorry
 for the digression..

Good point --- we would be making plan decisions based on the visibility
map coverage.  The big question is whether visibility map changes are
more dynamic than the values we already plan against, like rows in the
table, table size, and value distributions.  I don't know the answer.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGC_S_DEFAULT is inadequate

2011-05-11 Thread Greg Stark

On Wed, May 11, 2011 at 3:20 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 So this is fine if the
 current value was from the file or was the boot_val, but if we'd
 overridden the boot value with a replacement default value using
 PGC_S_DEFAULT, that code would cause the value to revert to the boot_val
 not the replacement value.  Not desirable.

Doesn't this mean if you had a setting of timezone in your config file
and remove it and reload you'll get a different value than you would
have if you had actually booted without the line originally?


 My conclusion about all this is that we really need to invent another
 GucSource value falling between PGC_S_DEFAULT and PGC_S_ENV_VAR, called
 perhaps PGC_S_DYNAMIC_DEFAULT

LGTM



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Kevin Grittner

Bruce Momjian br...@momjian.us wrote:
 
 The very fact that Kevin and yourself bring up different reasons
 for why we need this feature makes me nervous.
 
 Yes, no question.  For count(*), you don't care about the indexed
 values, only the count, while for Kevin's case you are reading
 values from the index.
 
[sigh]  I'm reluctant to draw out this digression further, but there
is a possibly-useful point to be made here: these are not two
different things.  A covering index can be considered whenever the
set of columns referenced in the query is contained inside the set
of columns in the index.  The fact that the set of columns needed by
count(*) is the empty set merely means that it is covered by any
index, since the empty set is contained in every set.
 
Now, this special case may make for an easy initial target in
implementation, or allow early benchmarking.  If so, all the better
to go there first.  I'm not sure why anyone would stop there,
though; if it pays off for that simple case it is likely to pay off
for the more general case, too.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 10.05.2011 20:15, Simon Riggs wrote:
 I can picture that. Regrettably, I can also picture the accesses to
 the visibility map, the maintenance operations on the VM that are
 needed for this and the contention that both of those will cause.

 I agree that we need to do tests to demonstrate that there's a gain from 
 the patch, once we have a patch to test. I would be very surprised if 
 there isn't, but that just means the testing is going to be easy.

I think Simon's point is that showing a gain on specific test cases
isn't a sufficient argument.  What we need to know about this sort of
change is what is the distributed overhead that is going to be paid by
*everybody*, whether their queries benefit from the optimization or not.
And what fraction of real-world queries really do benefit, and to what
extent.  Isolated test cases (undoubtedly chosen to show off the
optimization) are not adequate to form a picture of the overall cost and
benefit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix for bug in ldapServiceLookup in libpq

2011-05-11 Thread Albe Laurenz

I have found a small but annoying bug in libpq where
connection parameters are resolved via LDAP.

There is a write past the end of a malloc'ed string which causes
memory corruption. The code and the bug are originally by me :^(

The attached patch fixes the problem in HEAD.
This should be backpatched to 8.2 where the code was introduced.

Yours,
Laurenz Albe


ldapServiceLookup.patch
Description: ldapServiceLookup.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Bruce Momjian

Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  On 10.05.2011 20:15, Simon Riggs wrote:
  I can picture that. Regrettably, I can also picture the accesses to
  the visibility map, the maintenance operations on the VM that are
  needed for this and the contention that both of those will cause.
 
  I agree that we need to do tests to demonstrate that there's a gain from 
  the patch, once we have a patch to test. I would be very surprised if 
  there isn't, but that just means the testing is going to be easy.
 
 I think Simon's point is that showing a gain on specific test cases
 isn't a sufficient argument.  What we need to know about this sort of
 change is what is the distributed overhead that is going to be paid by
 *everybody*, whether their queries benefit from the optimization or not.
 And what fraction of real-world queries really do benefit, and to what
 extent.  Isolated test cases (undoubtedly chosen to show off the
 optimization) are not adequate to form a picture of the overall cost and
 benefit.

Yes, I assume we are going to need the same kind of tests we did for
other invasive patches like serializable isolation level and hot
standby.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGC_S_DEFAULT is inadequate

2011-05-11 Thread Tom Lane

Greg Stark gsst...@mit.edu writes:
 On Wed, May 11, 2011 at 3:20 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 So this is fine if the
 current value was from the file or was the boot_val, but if we'd
 overridden the boot value with a replacement default value using
 PGC_S_DEFAULT, that code would cause the value to revert to the boot_val
 not the replacement value.  Not desirable.

 Doesn't this mean if you had a setting of timezone in your config file
 and remove it and reload you'll get a different value than you would
 have if you had actually booted without the line originally?

Yes, except it's timezone_abbreviations (and client_encoding) that are
at risk.  I think the latter is probably observably broken in existing
releases, though I've not tried to test.  Given the minuscule
use-case for setting client_encoding in postgresql.conf, I'm not
concerned about back-patching a fix, but I'd like to make it work better
going forward.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hint bit cache v5

2011-05-11 Thread Merlin Moncure

On Tue, May 10, 2011 at 11:59 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, May 9, 2011 at 5:12 PM, Merlin Moncure mmonc...@gmail.com wrote:

 I'd like to know if this is a strategy that merits further work...If
 anybody has time/interest that is.  It's getting close to the point
 where I can just post it to the commit fest for review.  In
 particular, I'm concerned if Tom's earlier objections can be
 satisfied. If not, it's back to the drawing board...

 I'm interested in what you're doing here.

 From here, there's quite a lot of tuning possibilities. It would be
 very useful to be able to define some metrics we are interested in
 reducing and working out how to measure them.

Following are results that are fairly typical of the benefits you
might see when the optimization kicks in.  The attached benchmark just
creates a bunch of records in a random table and scans it.  This is
more or less the scenario that causes people to grip about hint bit
i/o, especially in systems that are already under moderate to heavy
i/o stress.  I'm gonna call it for 20%, although it could be less if
you have an i/o system that spanks the test (try cranking -c and the
creation # records in bench.sql in that case).  Anecdotal reports of
extreme duress caused by hint bit i/o suggest problematic or mixed use
(OLTP + OLAP) workloads might see even more benefit.  One thing I need
to test is how much benefit you'll see with wider records.

I think I'm gonna revert the change to cache invalid bits. I just
don't see hint bits as a major contributor to dead tuples following
epic rollbacks (really, the solution for that case is simply to try
and not get in that scenario if you can).  This will put the code back
into the cheaper and simpler bit per transaction addressing.  What I
do plan to do though, is to check and set xmax commit bits in the
cache...that way deleted tuples will see cache benefits.

[hbcache]
merlin@mmoncure-ubuntu:~$ time pgbench -c 4 -n -T 200 -f bench.sql
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 1
duration: 200 s
number of transactions actually processed: 8
tps = 0.037167 (including connections establishing)
tps = 0.037171 (excluding connections establishing)

real3m35.549s
user0m0.008s
sys 0m0.004s

[HEAD]
merlin@mmoncure-ubuntu:~$ time pgbench -c 4 -n -T 200 -f bench.sql
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 1
duration: 200 s
number of transactions actually processed: 8
tps = 0.030313 (including connections establishing)
tps = 0.030317 (excluding connections establishing)

real4m24.216s
user0m0.000s
sys 0m0.012s


bench.sql
Description: Binary data


bench_setup.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Joseph Adams

On Wed, May 11, 2011 at 7:53 AM, Robert Haas robertmh...@gmail.com wrote:
 That's likely to be how it gets implemented, but you seem to have
 missed the point of some of the discussion upthread: the big problem
 with that is that someone might type DROP TYPE foo, and when they
 do, you need an efficient way to figure out whether foo is in use
 inside an instance of the variant type anywhere in the system.  The
 devil is in the details...

Sorry, I missed that.  That in mind, I think I would lean more toward
the union proposal as well.  Can anyone think of a case where VARIANT
would be more useful?

As for using one or two bytes to store the type of a UNION, that
creates a problem when you want to extend the union in the future.
That is, if a UNION is simply a collection of possible types values of
the UNION type can hold.

If UNION is implemented more like a tagged union:

CREATE TYPE token AS TAGGED UNION (identifier TEXT, keyword TEXT,
number INT);

Then the problem of altering it is much like the problem of altering an ENUM.

On Tue, May 10, 2011 at 5:19 PM, Darren Duncan dar...@darrenduncan.net wrote:
 Examples of open union types could be number, which all the numeric types
 compose, and so you can know say that you can use the generic numeric
 operators on values you have simply if their types compose the number union
 type, and it still works if more numeric types appear later.  Likewise, the
 string open union could include both text and blob, as both support
 catenation and substring matches or extraction, for example.

 This would aid to operator overloading in a generic way, letting you use the
 same syntax for different types, but allowing types to mix is optional; eg,
 you could support add(int,int) and add(real,real) without supporting
 add(int,real) etc but the syntax add(x,y) is shared, and you do this
 while still having a strong type system; allowing the mixing is optional
 case-by-case.

Coming from a Haskell perspective, this is a great idea, but I don't
think the union feature should be used to implement it.  Closed
unions correspond to algebraic data types in Haskell, e.g.:

data Ordering = LT | EQ | GT

while open unions are better-suited to type classes:

(+) :: (Num a) = a - a - a

I, for one, would like to see PostgreSQL steal some features from
Haskell's type system.  PostgreSQL seems to implement a subset of
Haskell's system, without type classes and where functions can have
only one type variable (anyelement).

To express the (+) example in PostgreSQL, it would be tempting to simply say:

add(real, real) returns real

However, what if each real is a different type (e.g. INT and FLOAT).
Is that allowed?  In the Haskell example above, (+) constraints both
of its arguments to the same type.  In ad-hoc syntax, it would look
like this in PostgreSQL:

real anyelement = add(anyelement, anyelement) returns anyelement

Another thing to consider: attempting to use a type class as a column
type, e.g.:

CREATE TABLE foo (n real);

Normally in Haskell, type information is passed implicitly as
parameters (hence the term parametric polymorphism), rather than
carried alongside values (like in object-oriented languages).  In the
case above, the type information would have to be carried with each
value.  Haskell actually supports this, but under a somewhat-weird
extension called Existential types (see
http://www.haskell.org/haskellwiki/Existential_type#Examples for an
example).  It isn't terribly useful in Haskell, and I don't think it
will be in PostgreSQL either.


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for bug in ldapServiceLookup in libpq

2011-05-11 Thread Tom Lane

Albe Laurenz laurenz.a...@wien.gv.at writes:
 I have found a small but annoying bug in libpq where
 connection parameters are resolved via LDAP.

 There is a write past the end of a malloc'ed string which causes
 memory corruption. The code and the bug are originally by me :^(

Hmm ... that's a bug all right, but why have the null-termination
inside the loop at all?  Seems like it should look like

for (p = result, i = 0; values[i] != NULL; ++i)
{
strncpy(p, values[i]-bv_val, values[i]-bv_len);
p += values[i]-bv_len;
*(p++) = '\n';
}
*p = '\0';

 This should be backpatched to 8.2 where the code was introduced.

Yes, will do.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:
 
 I think Simon's point is that showing a gain on specific test
 cases isn't a sufficient argument.
 
Ah, if that's what he's been trying to get at, I'm curious who
disagrees with that.  I wouldn't have thought anyone on this list
would.
 
 What we need to know about this sort of change is what is the
 distributed overhead that is going to be paid by *everybody*,
 whether their queries benefit from the optimization or not.
 
Certainly we need to test whether Heikki is right in the previously
non-quoted part of his post on this thread:
 
 Note that we already have the visibility map, and the accesses
 needed to update it are already there. Granted, we'll have to
 change the logic slightly to make it crash safe, but I don't
 expect that to add any meaningful overhead - the changes are
 going to be where the bits are set, ie. vacuum, not when the bits
 are cleared. Granted, we might also want to set the bits more
 aggressively once they're used by index-only-scans. But done
 correctly, just taking advantage of the VM that's already there
 shouldn't add overhead to other operations.
 
 Isolated test cases (undoubtedly chosen to show off the
 optimization) are not adequate to form a picture of the overall
 cost and benefit.
 
Well, first, that hardly seems fair.  I have many times seen people
make an effort to synthesize *worst* case benchmarks.  Certainly any
regular on this list would know it is pointless to show only a best
case benchmark.
 
Second, we really need to make development of a performance testing
farm a priority at PGCon next week.  The need for it just keeps
coming up over and over.
 
Third, Dan Ports has been working a great deal with DBT-2 running
PostgreSQL for the SSI patch, both as a stress tool to flush out
bugs and to get benchmarks numbers conforming to the published
requirements of that benchmark.  I know from off-list emails that it
took a fair amount of work to get it running correctly with
PostgreSQL in his environment.  We should probably try to draw on
that experience.  (Of course that shouldn't be the *only* test in a
performance testing farm, but it's a good one to include.)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of mar may 10 17:57:20 -0400 2011:
 On Wed, May 4, 2011 at 5:58 PM, Alvaro Herrera alvhe...@alvh.no-ip.org 
 wrote:

  both Oracle and MS-SQL have it
 
 Do they? What types are they called?

ANYTYPE

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Robert Haas

On Wed, May 11, 2011 at 11:43 AM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 On Tue, May 10, 2011 at 5:19 PM, Darren Duncan dar...@darrenduncan.net 
 wrote:
 Examples of open union types could be number, which all the numeric types
 compose, and so you can know say that you can use the generic numeric
 operators on values you have simply if their types compose the number union
 type, and it still works if more numeric types appear later.  Likewise, the
 string open union could include both text and blob, as both support
 catenation and substring matches or extraction, for example.

 This would aid to operator overloading in a generic way, letting you use the
 same syntax for different types, but allowing types to mix is optional; eg,
 you could support add(int,int) and add(real,real) without supporting
 add(int,real) etc but the syntax add(x,y) is shared, and you do this
 while still having a strong type system; allowing the mixing is optional
 case-by-case.

 Coming from a Haskell perspective, this is a great idea, but I don't
 think the union feature should be used to implement it.

I'm unclear what the point of such a feature would be.  A union of all
the common numeric types is not much different from the existing type
numeric.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for bug in ldapServiceLookup in libpq

2011-05-11 Thread Tom Lane

... btw, shouldn't this function free the result string when it's done
with it?  AFAICS that string is not returned to the caller, it's just
being leaked.

(I'll refrain from asking why it's creating the string in the first
place rather than parsing ldap_get_values_len's output as-is ...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hint bit cache v5

2011-05-11 Thread Merlin Moncure

On Wed, May 11, 2011 at 10:38 AM, Merlin Moncure mmonc...@gmail.com wrote:
 One thing I need to test is how much benefit you'll see with wider records.

The results are a bit better, around 25% using a similar methodology
on ~ 1k wide records.

 I think I'm gonna revert the change to cache invalid bits. I just
 don't see hint bits as a major contributor to dead tuples following
 epic rollbacks

what I meant to say here was, I don't see hint bit i/o following
rollbacks as a major issue.  Point being, I don't see much use in
optimizing management of INVALID tuple bits beyond what is already
done.

Anyways, demonstrating a 'good' case is obviously not the whole story.
 But what are the downsides?  There are basically two:

1) tiny cpu penalty on every heap fetch
2) extremely widely dispersed (in terms of transaction id) unhinted
tuples can force the cache to refresh every 100 tuples in the absolute
worst case.  A cache refresh is a 100 int sort and a loop.

For '1', the absolute worst case I can come up with, cpu bound scans
of extremely narrow records, the overall cpu usage goes up around 1%.
'2' seems just impossible to see in the real world -- and if it does,
you are also paying for lots of clog lookups all the way through the
slru, and you are having i/o and other issues on top of it.  Even if
all the stars align and it does happen, all the tuples get hinted and
dirtied anyways so it will only happen at most once on that particular
set of data.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Standbys which don't synch to disk?

2011-05-11 Thread Josh Berkus

Fujii, Simon,

For 9.1, both master and replica in a sync replication relationship are
required to be fsync'ing to disk.  I understand why we had to do that
for our first cut at synch rep.  Do you think, though, that it might
become possible to replicate without synch-to-disk for 9.2?

The use case I have is cloud hosting, where I'd rather have two or three
synchronous standbys than synch to disk.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standbys which don't synch to disk?

2011-05-11 Thread Robert Haas

On Wed, May 11, 2011 at 1:12 PM, Josh Berkus j...@agliodbs.com wrote:
 For 9.1, both master and replica in a sync replication relationship are
 required to be fsync'ing to disk.  I understand why we had to do that
 for our first cut at synch rep.  Do you think, though, that it might
 become possible to replicate without synch-to-disk for 9.2?

 The use case I have is cloud hosting, where I'd rather have two or three
 synchronous standbys than synch to disk.

It's already possible to set fsync=off on the standby if you want.  If
there is an OS-level crash you'll need to rebuild the standby, but in
some cases that may be acceptable.

And Simon has already written a patch to add a receive mode to sync
rep, which I expected will get committed to 9.2.  In that mode, the
standby can acknowledge the WAL records as soon as they are received,
and write them to disk just after.  I think we do need some
benchmarking there, to figure out whether any changes to the timing of
replies are needed in that case.  But the basic principal seems sound.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] potential bug in trigger with boolean params

2011-05-11 Thread Tom Lane

Andres Freund and...@anarazel.de writes:
 The grammar accepts only a very limited amount of parameters there:
 Err

 TriggerFuncArg:
 Iconst
 {
 char buf[64];
 snprintf(buf, sizeof(buf), %d, $1);
 $$ = makeString(pstrdup(buf));
 }
 | FCONST{ $$ = makeString($1); }
 | Sconst{ $$ = makeString($1); }
 | BCONST{ $$ = makeString($1); }
 | XCONST{ $$ = makeString($1); }
 | ColId { $$ = makeString($1); }

 That is integers, floating point, strings, bitstrings, hexstrings and column 
 references (???).

 How that exact list came to exist I do not know.

The documentation for CREATE FUNCTION says

arguments:
An optional comma-separated list of arguments to be provided to
the function when the trigger is executed. The arguments are
literal string constants. Simple names and numeric constants can
be written here, too, but they will all be converted to
strings.

The ColId case is meant to cover the simple names proviso, but of
course it fails to cover reserved words.  We could trivially fix that
by writing ColLabel instead of ColId.  My initial expectation was that
this would bloat the parser, but it seems not to: the backend gram.o
is exactly the same size after making the change, and ecpg's preproc.o
actually gets smaller (more opportunity to share states?).  So I'm
inclined to do it, rather than having to document that simple names
excludes reserved words.

A possibly more interesting question is why BCONST and XCONST were added
there.  The documentation certainly does not say or suggest that those
are legal options, and what's more the behavior could be considered
surprising:

regression=# CREATE TRIGGER trig_x_bconst BEFORE INSERT ON x FOR EACH ROW 
EXECUTE PROCEDURE trigger_x(b'1011');
CREATE TRIGGER
regression=# CREATE TRIGGER trig_x_xconst BEFORE INSERT ON x FOR EACH ROW 
EXECUTE PROCEDURE trigger_x(x'1234abcd');
CREATE TRIGGER
regression=# \d+ x
...
Triggers:
trig_x_bconst BEFORE INSERT ON x FOR EACH ROW EXECUTE PROCEDURE 
trigger_x('b1011')
trig_x_xconst BEFORE INSERT ON x FOR EACH ROW EXECUTE PROCEDURE 
trigger_x('x1234abcd')

I'm inclined to take those out, because (1) I find it shrinks the
generated grammar a tad (these productions *do* add to the size of the
state tables), and (2) if we don't, we ought to document this behavior,
and I don't want to do that either.

I see this as just a change to make in HEAD, it's not appropriate for
a back-patch.

Objections anyone?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standbys which don't synch to disk?

2011-05-11 Thread Josh Berkus


 It's already possible to set fsync=off on the standby if you want.  If
 there is an OS-level crash you'll need to rebuild the standby, but in
 some cases that may be acceptable.

Yes, generally if there's an OS-level crash on cloud hosting, you've
lost the instance anyway.

 And Simon has already written a patch to add a receive mode to sync
 rep, which I expected will get committed to 9.2.  In that mode, the
 standby can acknowledge the WAL records as soon as they are received,
 and write them to disk just after.  I think we do need some
 benchmarking there, to figure out whether any changes to the timing of
 replies are needed in that case.  But the basic principal seems sound.

Yes, that's what I'm looking for.  The one other thing would be the
ability not to fsync the master, which would come out of the whole
stream from buffers patch which Fujii was working on.  Fujii, is that
still something you're working on?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_upgrade and PGPORT

2011-05-11 Thread Peter Eisentraut

pg_upgrade is a bit schizophrenic concerning the PGPORT environment
variable.  On the one hand, there is this code in option.c that wants to
make use of it:

old_cluster.port = getenv(PGPORT) ? atoi(getenv(PGPORT)) : DEF_PGPORT;
new_cluster.port = getenv(PGPORT) ? atoi(getenv(PGPORT)) : DEF_PGPORT;
 
On the other hand, check.c will reject a set PGPORT because it's a libpq
environment variable.  Should we make an exception for PGPORT, like we
did for PGCLIENTENCODING?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hint bit cache v5

2011-05-11 Thread Simon Riggs

On Wed, May 11, 2011 at 4:38 PM, Merlin Moncure mmonc...@gmail.com wrote:

 Following are results that are fairly typical of the benefits you
 might see when the optimization kicks in.  The attached benchmark just

 [hbcache]
 real    3m35.549s

 [HEAD]
 real    4m24.216s

These numbers look very good. Thanks for responding to my request.

What people have said historically at this point is ah, but you've
just deferred the pain from clog lookups.

The best way to show this does what we hope is to run a normal-ish
OLTP access to the table that would normally thrash the clog and show
no ill effects there either. Run that immediately after the above
tests so that the cache and hint bits are both primed.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and PGPORT

2011-05-11 Thread Bruce Momjian

Peter Eisentraut wrote:
 pg_upgrade is a bit schizophrenic concerning the PGPORT environment
 variable.  On the one hand, there is this code in option.c that wants to
 make use of it:
 
 old_cluster.port = getenv(PGPORT) ? atoi(getenv(PGPORT)) : DEF_PGPORT;
 new_cluster.port = getenv(PGPORT) ? atoi(getenv(PGPORT)) : DEF_PGPORT;
  
 On the other hand, check.c will reject a set PGPORT because it's a libpq
 environment variable.  Should we make an exception for PGPORT, like we
 did for PGCLIENTENCODING?

Wow, good question.  Passing stuff in via libpq is certainly complex.

I ran a test and it looks like the command-line flag overrides the
PGPORT environment variable:

$ export PGPORT=
$ psql test
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket /tmp/.s.PGSQL.?
$ psql -p 5432 test
psql (9.1beta1)
Type help for help.

test=

I assume it is just like PGCLIENTENCODING.  PGCLIENTENCODING was easier
to ignore because we need it for error messages.

Are there other cases we should allow too?

A larger question is whether we should just disable all the checks for
environment variables.  The C comment says:

 * check_for_libpq_envvars()
 *
 * tests whether any libpq environment variables are set.
 * Since pg_upgrade connects to both the old and the new server,
 * it is potentially dangerous to have any of these set.
 *
 * If any are found, will log them and cancel.

I am not sure what to do.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] potential bug in trigger with boolean params

2011-05-11 Thread Andres Freund

On Wednesday, May 11, 2011 07:25:58 PM Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  The grammar accepts only a very limited amount of parameters there:
  Err
  
  TriggerFuncArg:
  Iconst
  
  {
  
  char buf[64];
  snprintf(buf, sizeof(buf), %d, $1);
  $$ = makeString(pstrdup(buf));
  
  }
  | 
  | FCONST{ $$ =
  | makeString($1); } Sconst{
  | $$ = makeString($1); } BCONST 
  |   { $$ = makeString($1); } XCONST 
  |   { $$ = makeString($1); } ColId  
  |   { $$ = makeString($1); }
  
  That is integers, floating point, strings, bitstrings, hexstrings and
  column references (???).
  
  How that exact list came to exist I do not know.
 
 The documentation for CREATE FUNCTION says
 
 arguments:
   An optional comma-separated list of arguments to be provided to
   the function when the trigger is executed. The arguments are
   literal string constants. Simple names and numeric constants can
   be written here, too, but they will all be converted to
   strings.
 
 The ColId case is meant to cover the simple names proviso, but of
 course it fails to cover reserved words.  We could trivially fix that
 by writing ColLabel instead of ColId.  My initial expectation was that
 this would bloat the parser, but it seems not to: the backend gram.o
 is exactly the same size after making the change, and ecpg's preproc.o
 actually gets smaller (more opportunity to share states?).  So I'm
 inclined to do it, rather than having to document that simple names
 excludes reserved words.
Good.

 A possibly more interesting question is why BCONST and XCONST were added
 there.  The documentation certainly does not say or suggest that those
 are legal options, and what's more the behavior could be considered
 surprising:
It seems to have originally been added there by Peter (as BITCONST) and then 
split by Thomas Lockhart.

See 73874a06 and eb121ba2
 regression=# CREATE TRIGGER trig_x_bconst BEFORE INSERT ON x FOR EACH ROW
 EXECUTE PROCEDURE trigger_x(b'1011'); CREATE TRIGGER
 regression=# CREATE TRIGGER trig_x_xconst BEFORE INSERT ON x FOR EACH ROW
 EXECUTE PROCEDURE trigger_x(x'1234abcd'); CREATE TRIGGER
 regression=# \d+ x
 ...
 Triggers:
 trig_x_bconst BEFORE INSERT ON x FOR EACH ROW EXECUTE PROCEDURE
 trigger_x('b1011') trig_x_xconst BEFORE INSERT ON x FOR EACH ROW EXECUTE
 PROCEDURE trigger_x('x1234abcd')
Err. Yes, that looks rather strange. And surprising.

 I'm inclined to take those out, because (1) I find it shrinks the
 generated grammar a tad (these productions *do* add to the size of the
 state tables), and (2) if we don't, we ought to document this behavior,
 and I don't want to do that either.
 I see this as just a change to make in HEAD, it's not appropriate for
 a back-patch.
I would say the above behaviour even is a bug. But given that I haven't 
seen/found anybody complaining about it fixing it properly looks pointless.
So yes, HEAD only sounds fine.

 Objections anyone?
Nope.

Is there a special reason for not using the normal function calling 
mechanisms? It looks to me as it was just done to have an easy way to store it 
in pg_trigger.tgargs.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standbys which don't synch to disk?

2011-05-11 Thread Tom Lane

Josh Berkus j...@agliodbs.com writes:
 It's already possible to set fsync=off on the standby if you want.  If
 there is an OS-level crash you'll need to rebuild the standby, but in
 some cases that may be acceptable.

 ...  The one other thing would be the
 ability not to fsync the master, which would come out of the whole
 stream from buffers patch which Fujii was working on.  Fujii, is that
 still something you're working on?

Huh?  Surely you can just set fsync=off on the master if you feel like
it.  Data integrity not guaranteed, of course, but if you don't care...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] potential bug in trigger with boolean params

2011-05-11 Thread Tom Lane

Andres Freund and...@anarazel.de writes:
 Is there a special reason for not using the normal function calling 
 mechanisms? It looks to me as it was just done to have an easy way to store 
 it 
 in pg_trigger.tgargs.

Well, this is all very historical, dating from Berkeley days AFAIK.
If we had it to do over, I bet we'd do it differently --- but the pain
of changing it seems to exceed any likely benefit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and PGPORT

2011-05-11 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 A larger question is whether we should just disable all the checks for
 environment variables.  The C comment says:

  * check_for_libpq_envvars()
  *
  * tests whether any libpq environment variables are set.
  * Since pg_upgrade connects to both the old and the new server,
  * it is potentially dangerous to have any of these set.
  *
  * If any are found, will log them and cancel.

 I am not sure what to do.

Well, the risk mentioned in that comment certainly seems real.

An alternative solution that might be more user-friendly is to ensure
that the connection strings pg_upgrade uses specify all important
options, leaving nothing to be overridden by environment variables.
Then you don't need to make the user adjust his environment.

Or you could just unsetenv instead of complaining.

I would like to think that eventually pg_upgrade won't start a
postmaster at all, but connect using something more like a standalone
backend.  So someday the issue might go away --- but that someday
isn't especially close.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and PGPORT

2011-05-11 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  A larger question is whether we should just disable all the checks for
  environment variables.  The C comment says:
 
   * check_for_libpq_envvars()
   *
   * tests whether any libpq environment variables are set.
   * Since pg_upgrade connects to both the old and the new server,
   * it is potentially dangerous to have any of these set.
   *
   * If any are found, will log them and cancel.
 
  I am not sure what to do.
 
 Well, the risk mentioned in that comment certainly seems real.
 
 An alternative solution that might be more user-friendly is to ensure
 that the connection strings pg_upgrade uses specify all important
 options, leaving nothing to be overridden by environment variables.
 Then you don't need to make the user adjust his environment.

Well, they can use the same port number for both servers.  In fact I
just use the compiled-default of 5432 when I am testing.  No reason they
could not supply value in an environment variable but it would have to
be the same for old and new server (the two servers never run at the
same time).

 Or you could just unsetenv instead of complaining.

I think it is really PGDATA that we certainly can't inherit from an
environment variable.

 I would like to think that eventually pg_upgrade won't start a
 postmaster at all, but connect using something more like a standalone
 backend.  So someday the issue might go away --- but that someday
 isn't especially close.

That standalone backend is going to have to understand pg_dump SQL
output, with \connect, etc.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hint bit cache v5

2011-05-11 Thread Merlin Moncure

On Wed, May 11, 2011 at 12:40 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, May 11, 2011 at 4:38 PM, Merlin Moncure mmonc...@gmail.com wrote:

 Following are results that are fairly typical of the benefits you
 might see when the optimization kicks in.  The attached benchmark just

 [hbcache]
 real    3m35.549s

 [HEAD]
 real    4m24.216s

 These numbers look very good. Thanks for responding to my request.

 What people have said historically at this point is ah, but you've
 just deferred the pain from clog lookups.

Deferred, or eliminated. If any tuple on the page gets updated,
deleted, etc or the the table itself is dropped then you've
'won'...the page with rhw hint bit only change was never booted out to
the heap before another substantive change happened.  This is exactly
what happens in certain common workloads -- you insert a bunch of
records, scan them with some batch process, then delete them.  Let's
say a million records were inserted under a single transaction and you
are getting bounced out of the transam.c cache, you just made a
million calls to TransactionIdIsCurrentTransactionId and (especially)
TransactionIdIsInProgress for the *exact same* transaction_id, over
and over.  That stuff adds up even before looking at the i/o incurred.

Put another way, the tuple hint bits have a lot of usefulness when the
tuples on the page are coming from all kinds of differently aged
transactions.  When all the tuples have the same or similar xid, the
information value is quite low, and the i/o price isn't worth it.  The
cache neatly haircuts the downside case.  If the cache isn't helping
(any tuple fetch on the page faults through it), the page is dirtied
and the next time it's fetched all the bits will be set.

 The best way to show this does what we hope is to run a normal-ish
 OLTP access to the table that would normally thrash the clog and show
 no ill effects there either. Run that immediately after the above
 tests so that the cache and hint bits are both primed.

yeah.  the only easy way I know of to do this extremely long pgbench
runs, and getting good results is harder than it sounds...if the tuple
hint bits make it to disk (via vacuum or a cache fault), they stay
there and that tuple is no longer interesting from the cache point of
view.

If you make the scale really large the test will just take forever
just to get the tables primed (like a week).  Keep in mind, autovacuum
can roll around at any time and set the bits under you (you can of
course disable it, but who really does than on OLTP?).  Small scale
oltp tests are not real world realistic because anybody sane would
just let autovacuum loose on the table. clog thrashing systems are
typically mature, high load oltp databases...not fun to test on your
single 7200 rpm drive.

I'm going to boldly predict that with all the i/o flying around in
cases like that, the paltry cpu cycles spent dealing with the cache
are the least of your problems.  Not discounting the need to verify
that though.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Tables cannot have INSTEAD OF triggers

2011-05-11 Thread Peter Eisentraut

Why not?  Is there a fundamental problem, or just that no one wanted to
make it work?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and PGPORT

2011-05-11 Thread Robert Haas

On Wed, May 11, 2011 at 2:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Or you could just unsetenv instead of complaining.

+1 for that.

 I would like to think that eventually pg_upgrade won't start a
 postmaster at all, but connect using something more like a standalone
 backend.  So someday the issue might go away --- but that someday
 isn't especially close.

And +1 for that, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standbys which don't synch to disk?

2011-05-11 Thread Josh Berkus

Robert,

  That WAL has effectively disappeared from the
  master, but is still present on the slave.  Now the master comes up
  and starts processing read-write transactions again, and generates a
  new and different 1kB of WAL.  Hilarity ensues, because the two
  machines are now out of step with each other.

Yeah, you'd need some kind of instant failover and STONITH.  That is,
any interruption on the master would be a failover situation.  While
that seems conceivable for crashes, consider that a planned restart of
the master might be an issue, and an OOM-kill would certainly be.

  You could possibly fix this by making provision for the master to
  connect to the slave on start-up and stream WAL backwards from slave
  to master.  That'd be pretty spiffy.

Ouch, now you're making my head hurt.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unfriendly handling of pg_hba SSL options with SSL off

2011-05-11 Thread Magnus Hagander

On Tue, May 10, 2011 at 05:39, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 Late reply, but we are basically ignoring 'local' lines if the build
 doesn't support unix domain sockets (windows), but throwing an error for
 hostssl usage if ssl is not compiled in.  Is the only logic here that
 'local' is part of the default pg_hba.conf and hostssl is not?  Is that
 good logic?

 I wouldn't have a problem with making the Windows port throw an error
 for local lines.  We'd have to fix initdb to remove that line from the
 sample file (if it doesn't already), but that's surely not hard.

It does already (that's what the @remove-line-for-nolocal@ markup in
the sample file is for).

So +1 for making it throw an error.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tables cannot have INSTEAD OF triggers

2011-05-11 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 Why not?  Is there a fundamental problem, or just that no one wanted to
 make it work?

I'm fairly sure there was a substantive issue, but memory fails as to
what it was.  You could try removing the error check and see what
breaks ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Darren Duncan


Robert Haas wrote:

On Wed, May 11, 2011 at 11:43 AM, Joseph Adams
joeyadams3.14...@gmail.com wrote:

On Tue, May 10, 2011 at 5:19 PM, Darren Duncan dar...@darrenduncan.net wrote:

Examples of open union types could be number, which all the numeric types
compose, and so you can know say that you can use the generic numeric
operators on values you have simply if their types compose the number union
type, and it still works if more numeric types appear later.  Likewise, the
string open union could include both text and blob, as both support
catenation and substring matches or extraction, for example.

This would aid to operator overloading in a generic way, letting you use the
same syntax for different types, but allowing types to mix is optional; eg,
you could support add(int,int) and add(real,real) without supporting
add(int,real) etc but the syntax add(x,y) is shared, and you do this
while still having a strong type system; allowing the mixing is optional
case-by-case.

Coming from a Haskell perspective, this is a great idea, but I don't
think the union feature should be used to implement it.


I'm unclear what the point of such a feature would be.  A union of all
the common numeric types is not much different from the existing type
numeric.


In the case of the open union numeric, one point is that users or extensions 
could come up with new types that add themselves to the union, for example say a 
type for complex numbers (I didn't see a built-in such).


But I'm just citing numeric as an example; there would be a lot more in 
practice, potentially one for every individual type, so for example if operators 
were defined for the open union rather than for the base type, then 
users/extensions could define their own types and easily declare you can use it 
like this type but its different in some important way, which may just be an 
implementation difference.  Operations that don't care about the differences can 
just be written against the open union type where they just work and those that 
do care can be more specific.


Joseph Adams said:

Coming from a Haskell perspective, this is a great idea, but I don't
think the union feature should be used to implement it.  Closed
unions correspond to algebraic data types in Haskell, e.g.:

data Ordering = LT | EQ | GT

while open unions are better-suited to type classes:

(+) :: (Num a) = a - a - a


While closed unions would often be used for your first examlpe, I see they're 
still useful for type classes as well.  Especially in the face of open unions 
being available, the closed unions let users say, no, I don't really want the 
meaning of this union type to change just because someone else declares a new 
type in scope (that adds itself to the union).  For example, one could declare a 
system_numeric closed union type that only includes Pg built-in numerics and 
users of that can be confident that nothing about it will change later unless 
the definition of system_numeric itself or the types it unions are changed.  But 
open unions would be preferred in places they wouldn't cause trouble, where you 
want to allow easier user extensibility.



I, for one, would like to see PostgreSQL steal some features from
Haskell's type system.  PostgreSQL seems to implement a subset of
Haskell's system, without type classes and where functions can have
only one type variable (anyelement).


I think that Haskell and other functional languages have a lot to teach 
relational DBMSs and I see them as being highly compatible.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays

2011-05-11 Thread Noah Misch

On Wed, May 11, 2011 at 10:22:01AM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  On Mon, May 09, 2011 at 11:32:28PM -0400, Tom Lane wrote:
  So we basically had three alternatives to make it better:
 * downcast to the array type, which would possibly silently
   break applications that were relying on the function result
   being considered of the domain type
 * re-apply domain checks on the function result, which would be
   a performance hit and possibly again result in unobvious
   breakage
 * explicitly break it by throwing a parse error until you
   downcast (and then upcast the function result if you want)
  I realize that #3 is a bit unpleasant, but are either of the other two
  better?  At least #3 shows you where you need to check for problems.
 
  Though I've never used a domain over an array type, I'd strongly prefer #2.
 
 Hmm.  I hadn't seriously considered that alternative, but we could go in
 that direction.  Logically, what this would probably imply is inserting
 CastToDomain whenever the result of a polymorphic function is deemed to
 be of a domain type, whether the base type is array or not.
 
 The reason I hadn't taken it very seriously is that I don't think it's
 actually going to end up being consistent.  If we don't do #1 (downcast
 polymorphic arguments to a base type), but consider the arguments passed
 to the function to be of the domain type, then really we have to expect
 the polymorphic function to enforce domain constraints internally; we
 cannot fix it with something as localized as having the function call
 parser stick a CastToDomain on top.  Here's a possibly rather silly
 example:
 
   create function negate(anyelement) returns anyelement as
   $$ select - $1 $$ language sql;
 
   create domain pos as int check (value  0);
 
   select negate(42::pos);
 
 This negate() function will work for any type that has a unary minus
 operator.  But the result of the unary minus operation cannot sanely be
 considered to be of this domain type.

While simple, I think that example covers the salient features.  The git master
behavior is sound: ERROR:  return type mismatch in function declared to return
pos DETAIL:  Actual return type is integer.  If you defined a unary minus
operator for the domain type itself, the function implementing that operator
would then gain responsibility for preserving any domain constraints.

Now, perhaps it's unfortunate that the example can't be easily rewritten to
actually work for arbitrary domain inputs (without rewriting it in C).  I don't
have any particular ideas for improving that.

 In this simplified example you
 might feel it doesn't matter, since with an external CastToDomain we'd
 throw error anyway a moment later, as soon as control comes back from
 the function.  But what if the function does further operations with the
 value, such as passing it to another polymorphic function?

The SQL PL understands that the value of -($1::pos) has type integer, as does
PL/pgSQL.  I'm not seeing any problems offhand.

 So really, if you go down this path, you end up concluding that PLs
 supporting polymorphic arguments had better be prepared to enforce
 domain constraints all the way through, and thus there should be no need
 for an external CastToDomain --- what comes back from the function ought
 to be checked already.

That was my conclusion.  I'm not aware of any particular holes in this area, but
I won't wager there are none.  The code that produces the domain-typed datums is
not usually part of the PL implementation, so the PL mostly needs to rigorously
track the provenance of its datums.

 Unfortunately, even if the PLs do that (SQL
 functions might get it right, but I'm not real sure whether plpgsql is
 water-tight on this, and I don't trust the other PLs for it at all),
 there's no way that built-in polymorphic functions like array_append are
 going to.
 
 So on the whole, #2 looks like an implementation quagmire to me: it's
 not clear what to check, or where, or how you know when you're done.

Every C function is responsible for returning a datum consistent with its
declared return type: we do nothing in particular to ensure that a function has
done so.  I see domain constraints as a natural extension of that rule.  If a C
function purports to return a domain-typed datum, perhaps through polymorphism,
it's responsible for having checked the datum against that domain.  Every PL
accepts responsibility for enforcing this on behalf of its functions, much as it
enforces general consistency between return types and actual returned datums.

A function with a polymorphic argument returning the same polymorphic type is
consequently responsible for applying domain checks when the concrete type is a
domain.  (A function with an anyelement argument that returns anyarray is not a
problem, because the return type is never a domain.  Likewise for a function
with an anyarray argument that returns

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Darren Duncan


To follow-up my earlier comments ...

I suspect for practical purposes we may want to limit the scope of some type 
features.


For example, the greatest benefits for open union / mixin types is with 
routines/operators, not so much with tables.


So, Pg could choose to support open unions but only for routines, where the 
declared types of table columns must still be other types.  Then you don't have 
to worry about searching your data for where one might be used, but only search 
your system catalog for routines/operators that use it.


But closed unions should be supported in tables too.

-- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to allow domains over composite types

2011-05-11 Thread Robert Haas

On Wed, May 11, 2011 at 8:07 AM, Yeb Havinga yebhavi...@gmail.com wrote:
 typecmds.c says:
 Domains over composite types might be made to work in the future, but not
 today.

 Attached is a patch that allows domains over composite types, together with
 test cases in domaincomp.sql. A domain over a composite type has typtype
 TYPTYPE_DOMAIN, but typrelid and typrelkind are empty: that information is
 only available in the pg_type record of the base type. The remainder of the
 patch follows from that choice. While parsing a record expression into a row
 type, an extra coercion node had to be inserted to ensure that the domain
 checks are called.

 All regression tests are ok, comments are highly appreciated.

I don't have time to look at this right now, but please add it to the
currently-open CommitFest:

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collation mega-cleanups

2011-05-11 Thread Peter Eisentraut

On mån, 2011-05-09 at 14:58 -0400, Bruce Momjian wrote:
 Tom this collation stuff has seen more post-feature-commit cleanups
 than I think any patch I remember.  Is there anything we can learn
 from this?

Don't do big patches?

Seriously, it looks pretty bad, but this is one of the biggest feature
patches in the last 5 years, it touches many places all over the system,
and there is a reason why this topic has been on the TODO list for 10
years: it's overwhelming.  I had aimed for a 75% solution: have
something that supports useful cases, that doesn't break anything if you
don't use it, and that can be expanded later.  Now maybe I only reached
70%, and maybe the baseline should have been 80%, but what we now have
is more like 107% and includes a handful of features I had explicitly
excluded from the first round.

The patch has been around for 10 months, it has been in every commit
fest, it has tests and documentation, it has been reviewed a bunch of
times, people evidently read (some of) the code, they gave feedback,
adjustments have been made (some reverted during later cleanup, go
figure), performance was questioned, performance tests were done,
adjustments were made, people told me to commit it, so I did, if people
had told me to revert it, I would have reverted it.  What can we learn
from that?  The bigger your patch, the lonelier you are.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching

2011-05-11 Thread Peter Eisentraut

On tis, 2011-05-10 at 18:05 -0400, Tom Lane wrote:
 The lack of initdb support for getting more-or-less-standard collation
 entries into pg_collation on Windows seems to be the major missing
 piece from here (dunno if Peter is aware of others).  If we don't fix
 that before release, we're going to regret it anyway IMO, because of
 the inevitable tide of questions/complaints from Windows users trying
 to use the collation feature.  We've already seen at least one such
 from a beta tester.

Well, someone who wants it will have to do it.  It's pretty simple, but
not simple enough to code it blindly.  If someone wants to do it, I can
tell them exactly what to do.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collation mega-cleanups

2011-05-11 Thread Bruce Momjian

Peter Eisentraut wrote:
 from that?  The bigger your patch, the lonelier you are.

I can attest to that.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collation mega-cleanups

2011-05-11 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 Seriously, it looks pretty bad, but this is one of the biggest feature
 patches in the last 5 years, it touches many places all over the system,
 and there is a reason why this topic has been on the TODO list for 10
 years: it's overwhelming.

Yeah.  I did not want to press for reverting, because it seemed clear
to me that there was no way that this feature would ever get in if we
insisted that it be 100% right when committed.  My idea of good enough
kept moving the more I looked at the patch, though, and it's still
moving --- now I think that we really need to fix the lack of preloaded
pg_collation entries for Windows, and then get in a regression test that
runs everywhere.  So if you want to call that feature creep, go ahead.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching

2011-05-11 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 On tis, 2011-05-10 at 18:05 -0400, Tom Lane wrote:
 The lack of initdb support for getting more-or-less-standard collation
 entries into pg_collation on Windows seems to be the major missing
 piece from here (dunno if Peter is aware of others).  If we don't fix
 that before release, we're going to regret it anyway IMO, because of
 the inevitable tide of questions/complaints from Windows users trying
 to use the collation feature.  We've already seen at least one such
 from a beta tester.

 Well, someone who wants it will have to do it.  It's pretty simple, but
 not simple enough to code it blindly.  If someone wants to do it, I can
 tell them exactly what to do.

Hm, do you know how to enumerate the available locales on Windows?
(Still not volunteering, since I couldn't test it, but that's the only
missing piece of information AFAIK.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Tom Lane

Darren Duncan dar...@darrenduncan.net writes:
 But I'm just citing numeric as an example; there would be a lot more
 in practice, potentially one for every individual type, so for example
 if operators were defined for the open union rather than for the base
 type, then users/extensions could define their own types and easily
 declare you can use it like this type but its different in some
 important way, which may just be an implementation difference.
 Operations that don't care about the differences can just be written
 against the open union type where they just work and those that do
 care can be more specific.

I'm just an old-school abstract data type hacker, but I don't see
anything in what you're saying that doesn't work today in our existing
type system: with overloaded and/or polymorphic operators and functions
you can get all those effects.  Maybe there would be some small gain in
ability to share code for tasks that fall between single-data-type and
works-for-anything cases, but it looks like a pretty marginal
improvement from here; probably not worth the cost and compatibility
implications of a major overhaul of the type system.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Packaging

2011-05-11 Thread David E. Wheeler

On Apr 28, 2011, at 2:16 PM, David E. Wheeler wrote:

 So maybe it's half-assed. Maybe the version can be anything but the revision 
 must be an integer. Maybe there's a `pg_extension_version($extension_name)` 
 function that returns ARRAY[$version, $revision], and the revision is set in 
 the control file but not included in the version or in the upgrade file 
 names. I think I can live with that. But, hell, you're halfway to mandating 
 the meaning by doing this. Will we have to go the rest of the way in the 
 future?

Okay, how we add a revision key to the control file and extrevision to the 
pg_extension catalog. Its type can be TEXT and is optional for use by 
extensions.

This would allow extension authors to identify the base version of an extension 
but also the revision. And the core doesn't have to care how it works or if 
it's used, but it would allow users to know exactly what they have installed.

Thoughts?

Best,

David




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching

2011-05-11 Thread Peter Eisentraut

On ons, 2011-05-11 at 16:47 -0400, Tom Lane wrote:
 Hm, do you know how to enumerate the available locales on Windows?

EnumSystemLocalesEx()

Reference:
http://msdn.microsoft.com/en-us/library/dd317829(v=vs.85).aspx

Example: http://msdn.microsoft.com/en-us/library/dd319091(v=vs.85).aspx

As you can see in the example, this returns names like en-US and
es-ES.  I would imagine we normalize this to the usual en_US,
es_ES (but we could also install the not normalized names, just like
we install en_US.utf8).

But you need to rearrange the code in initdb a bit because this thing
works with callbacks.

There is an older interface EnumSystemLocales() which returns locale
IDs, which you then have to look up and convert into a name manually.
There is code for that in the old installer CVS on pgfoundry.  But it's
very ugly, so I'd rather skip that and just concentrate on supporting
the newer interface.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching

2011-05-11 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 On ons, 2011-05-11 at 16:47 -0400, Tom Lane wrote:
 Hm, do you know how to enumerate the available locales on Windows?

 EnumSystemLocalesEx()

 Reference:
 http://msdn.microsoft.com/en-us/library/dd317829(v=vs.85).aspx

 Example: http://msdn.microsoft.com/en-us/library/dd319091(v=vs.85).aspx

Doesn't look too bad ...

 There is an older interface EnumSystemLocales() which returns locale
 IDs, which you then have to look up and convert into a name manually.
 There is code for that in the old installer CVS on pgfoundry.  But it's
 very ugly, so I'd rather skip that and just concentrate on supporting
 the newer interface.

I guess the question is what happens on pre-Vista Windows if we use
EnumSystemLocalesEx.  I don't object to just not populating pg_collation
in that case, but we probably don't want it to fail entirely.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] performance-test farm

2011-05-11 Thread Tomas Vondra

Hi everyone,

several members of this mailing list mentioned recently it'd be really
useful to have a performance-test farm, that it might improve the
development process and make some changes easier.

I've briefly discussed this with another CSPUG member, who represents a
local company using PostgreSQL for a long time (and that supports
CSPUG), and we've agreed to investigate this a bit further.

I do have a rough idea what it might look like, but I've never built
performance-testing farm for such distributed project. So I'd like to
know what would you expect from such beast. Especially

1) Is there something that might serve as a model?

   I've googled to seach if there's some tool but performance-test
   farm gave me a  lot of info about how to breed cows, pigs and goats
   on a farm, but that's not very useful in this case I guess.

2) How would you use it? What procedure would you expect?

   I mean this should produce regular performance test of the current
   sources (and publish it on some website), but the whole point is to
   allow developers to do a performance test of their changes before
   commit to the main.

   How would you expect to deliver these changes to the farm? How would
   you define the job? How would you expect to get the results? etc.

   Just try to write down a list of steps.

3) Any other features expected?

   If you notice any interesting feature, write it down and note
   whether it's a 'must have' or a 'nice to have' feature.


I really can't promise anything right now - I have just a very rough
idea how much time/effort/money this might take. So let's see what is
needed to build a 'minimal farm' and if it's feasible with the resources
we can get.

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] performance-test farm

2011-05-11 Thread Kevin Grittner

Tomas Vondra t...@fuzzy.cz wrote:
 
 1) Is there something that might serve as a model?
 
I've been assuming that we would use the PostgreSQL Buildfarm as a
model.
 
http://buildfarm.postgresql.org/
 
 2) How would you use it? What procedure would you expect?
 
People who had suitable test environments could sign up to
periodically build and performance test using the predetermined test
suite, and report results back for a consolidated status display. 
That would spot regressions.
 
It would be nice to have a feature where a proposed patch could be
included for a one-time build-and-benchmark run, so that ideas could
be tried before commit.  It can be hard to anticipate all the
differenced between Intel and AMD, Linux and Windows, 32 bit and 64
bit, etc.
 
 3) Any other features expected?
 
Pretty graphs?  :-)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Packaging

2011-05-11 Thread Robert Haas

On Wed, May 11, 2011 at 5:06 PM, David E. Wheeler da...@kineticode.com wrote:
 On Apr 28, 2011, at 2:16 PM, David E. Wheeler wrote:

 So maybe it's half-assed. Maybe the version can be anything but the revision 
 must be an integer. Maybe there's a `pg_extension_version($extension_name)` 
 function that returns ARRAY[$version, $revision], and the revision is set in 
 the control file but not included in the version or in the upgrade file 
 names. I think I can live with that. But, hell, you're halfway to mandating 
 the meaning by doing this. Will we have to go the rest of the way in the 
 future?

 Okay, how we add a revision key to the control file and extrevision to the 
 pg_extension catalog. Its type can be TEXT and is optional for use by 
 extensions.

 This would allow extension authors to identify the base version of an 
 extension but also the revision. And the core doesn't have to care how it 
 works or if it's used, but it would allow users to know exactly what they 
 have installed.

 Thoughts?

How would pg_extension.extrevision be kept up to date?  AFAICS, the
whole point is that you might swap out the shared libraries without
doing anything at the SQL level.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] performance-test farm

2011-05-11 Thread Tomas Vondra

Dne 11.5.2011 23:41, Kevin Grittner napsal(a):
 Tomas Vondra t...@fuzzy.cz wrote:
  
 1) Is there something that might serve as a model?
  
 I've been assuming that we would use the PostgreSQL Buildfarm as a
 model.
  
 http://buildfarm.postgresql.org/

Yes, I was thinking about that too, but

1) A buildfarm used for regular building / unit testing IMHO may not
   be the right place to do performance testing (not sure how isolated
   the benchmarks can be etc.).

2) Not sure how open this might be for the developers (if they could
   issue their own builds etc.).

3) If this should be part of the current buildfarm, then I'm afraid I
   can't do much about it.

 2) How would you use it? What procedure would you expect?
  
 People who had suitable test environments could sign up to
 periodically build and performance test using the predetermined test
 suite, and report results back for a consolidated status display. 
 That would spot regressions.

So it would be a 'distributed farm'? Not sure it that's a good idea, as
to get reliable benchmark results you need a proper environment (not
influenced by other jobs, changes of hw etc.).

 It would be nice to have a feature where a proposed patch could be
 included for a one-time build-and-benchmark run, so that ideas could
 be tried before commit.  It can be hard to anticipate all the
 differenced between Intel and AMD, Linux and Windows, 32 bit and 64
 bit, etc.

Yes, that's one of the main goals - to allow developers to benchmark
their patches under various workloads. I don't think we'll be able to
get all those configurations, though.

 3) Any other features expected?
  
 Pretty graphs?  :-)
Sure. And it will be Web 2.0 ready ;-)

Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prefered Types

2011-05-11 Thread Alvaro Herrera

Excerpts from Tom Lane's message of dom may 08 23:00:27 -0400 2011:

 For
 example, if you start noticing an occasional integer overflow that
 didn't happen before, it might be pretty darn difficult to figure out
 that the problem is that an operation that was formerly resolved as int4
 + int4 is now resolved as int2 + int2.

About this particular example, I mentioned earlier that I preliminarly
consider the idea of changing the +(int2,int2) operator to return int4
instead of the current int2.  It's not impossible that we will miss
changing some operators, but my hope is that it won't be that horrible.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] XML with invalid chars

2011-05-11 Thread Andrew Dunstan




On 05/09/2011 11:25 PM, Noah Misch wrote:


I see you've gone with doing it unconditionally.  I'd lean toward testing the
library in pg_xml_init and setting a flag indicating whether we need the extra
pass.  However, a later patch can always optimize that.




I wasn't terribly keen on the idea, but we can look at it again later.



Please review and try to break.

Here are the test cases I tried:

-- caught successfully
SELECT E'\x01'::xml;
SELECT xmlcomment(E'\x01');
SELECT xmlelement(name foo, xmlattributes(E'\x01' AS bar), '');
SELECT xmlelement(name foo, NULL, E'\x01');
SELECT xmlforest(E'\x01' AS foo);
SELECT xmlpi(name foo, E'\x01');
SELECT query_to_xml($$SELECT E'\x01'$$, true, false, '');

-- not caught
SELECT xmlroot('root/', version E'\x01');



That's an easy fix.



SELECT xmlcomment(E'\ufffe');



That's a bit harder. Do we want to extend these checks to cover 
surrogates and end of plane characters, which are the remaining 
forbidden chars? It certainly seems likely to be a somewhat slower test 
since I think we'd need to process the input strings a Unicode char at a 
time, but we do that in other places and it seems acceptable. What do 
people think?




-- not directly related, but also wrongly accepted
SELECT xmlroot('root/', version ' ');
SELECT xmlroot('root/', version 'foo');

Offhand, I don't find libxml2's handling of XML declarations particularly
consistent.  My copy's xmlCtxtReadDoc() API (used by xml_in when xmloption =
document) accepts '?xml version=foo?' but rejects'?xml version= ?'.
Its xmlParseBalancedChunkMemory() API (used by xml_in when xmloption = content)
accepts anything, even control characters.  The XML 1.0 standard is stricter:
the version must match ^1\.[0-9]+$.  We might want to tighten this at the same
time.



We can add some stuff to check the version strings. Doesn't seem 
terribly difficult.



libxml2's error message for this case is PCDATA invalid Char value 1
(assuming \x01).  Mentioning PCDATA seems redundant, since no other context
offers greater freedom.  How about:

ereport(ERROR,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 errmsg(invalid XML 1.0 Char \\U%08x, char_val)));




That would also mean processing the string a unicode char at a time. So 
maybe that's what we need to do.


Thanks for the input.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] performance-test farm

2011-05-11 Thread Kevin Grittner

Tomas Vondra t...@fuzzy.cz wrote:
 Dne 11.5.2011 23:41, Kevin Grittner napsal(a):
 Tomas Vondra t...@fuzzy.cz wrote:
  
 1) Is there something that might serve as a model?
  
 I've been assuming that we would use the PostgreSQL Buildfarm as
 a model.
  
 http://buildfarm.postgresql.org/
 
 Yes, I was thinking about that too, but
 
 1) A buildfarm used for regular building / unit testing IMHO may
not be the right place to do performance testing (not sure how
isolated the benchmarks can be etc.).
 
I'm not saying that we should use the existing buildfarm, or expect
current buildfarm machines to support this; just that the pattern of
people volunteering hardware in a similar way would be good.
 
 2) Not sure how open this might be for the developers (if they
could issue their own builds etc.).
 
I haven't done it, but I understand that you can create a local
buildfarm instance which isn't reporting its results.  Again,
something similar might be good.
 
 3) If this should be part of the current buildfarm, then I'm
afraid I can't do much about it.
 
Not part of the current buildfarm; just using a similar overall
pattern.  Others may have different ideas; I'm just speaking for
myself here about what seems like a good idea to me.
 
 2) How would you use it? What procedure would you expect?
  
 People who had suitable test environments could sign up to
 periodically build and performance test using the predetermined
 test suite, and report results back for a consolidated status
 display. That would spot regressions.
 
 So it would be a 'distributed farm'? Not sure it that's a good
 idea, as to get reliable benchmark results you need a proper
 environment (not influenced by other jobs, changes of hw etc.).
 
Yeah, accurate benchmarking is not easy.  We would have to make sure
people understood that the machine should be dedicated to the
benchmark while it is running, which is not a requirement for the
buildfarm.  Maybe provide some way to annotate HW or OS changes?
So if one machine goes to a new kernel and performance changes
radically, but other machines which didn't change their kernel
continue on a level graph, we'd know to suspect the kernel rather
than some change in PostgreSQL code.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and PGPORT

2011-05-11 Thread Bruce Momjian

Robert Haas wrote:
 On Wed, May 11, 2011 at 2:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Or you could just unsetenv instead of complaining.
 
 +1 for that.

OK, the attached patch does this, but allows PGCLIENTENCODING to be
passed in.  The new output looks like:

Performing Consistency Checks
-
ignoring libpq environment variable PGPORT
Checking old data directory (/u/pgsql.old/data) ok
Checking old bin directory (/u/pgsql.old/bin)   ok
Checking new data directory (/u/pgsql/data) ok
Checking new bin directory (/u/pgsql/bin)   ok

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
new file mode 100644
index 8fce305..bf30dcd
*** a/contrib/pg_upgrade/server.c
--- b/contrib/pg_upgrade/server.c
*** check_for_libpq_envvars(void)
*** 254,260 
  {
  	PQconninfoOption *option;
  	PQconninfoOption *start;
- 	bool		found = false;
  
  	/* Get valid libpq env vars from the PQconndefaults function */
  
--- 254,259 
*** check_for_libpq_envvars(void)
*** 264,290 
  	{
  		if (option-envvar)
  		{
! 			const char *value;
  
  			/* This allows us to see error messages in the local encoding */
  			if (strcmp(option-envvar, PGCLIENTENCODING) == 0)
  continue;
  
- 			value = getenv(option-envvar);
  			if (value  strlen(value)  0)
  			{
! found = true;
! 
  pg_log(PG_WARNING,
! 	   libpq env var %-20s is currently set to: %s\n, option-envvar, value);
  			}
  		}
  	}
  
  	/* Free the memory that libpq allocated on our behalf */
  	PQconninfoFree(start);
- 
- 	if (found)
- 		pg_log(PG_FATAL,
- 			   libpq env vars have been found and listed above, please unset them for pg_upgrade\n);
  }
--- 263,287 
  	{
  		if (option-envvar)
  		{
! 			const char *value = getenv(option-envvar);
  
  			/* This allows us to see error messages in the local encoding */
  			if (strcmp(option-envvar, PGCLIENTENCODING) == 0)
  continue;
  
  			if (value  strlen(value)  0)
  			{
! #ifndef WIN32
! unsetenv(option-envvar);
! #else
! SetEnvironmentVariableA(option-envvar, );
! #endif
  pg_log(PG_WARNING,
! 	ignoring libpq environment variable %s\n, option-envvar);
  			}
  		}
  	}
  
  	/* Free the memory that libpq allocated on our behalf */
  	PQconninfoFree(start);
  }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] performance-test farm

2011-05-11 Thread Tomas Vondra

Dne 12.5.2011 00:21, Kevin Grittner napsal(a):
 Tomas Vondra t...@fuzzy.cz wrote:
 Dne 11.5.2011 23:41, Kevin Grittner napsal(a):
 Tomas Vondra t...@fuzzy.cz wrote:
  
 1) Is there something that might serve as a model?
  
 I've been assuming that we would use the PostgreSQL Buildfarm as
 a model.
  
 http://buildfarm.postgresql.org/

 Yes, I was thinking about that too, but

 1) A buildfarm used for regular building / unit testing IMHO may
not be the right place to do performance testing (not sure how
isolated the benchmarks can be etc.).
  
 I'm not saying that we should use the existing buildfarm, or expect
 current buildfarm machines to support this; just that the pattern of
 people volunteering hardware in a similar way would be good.

Good point. Actually I was not aware of how the buildfarm works, all I
knew was there's something like that because some of the hackers mention
a failed build on the mailing list occasionally.

So I guess this is a good opportunity to investigate it a bit ;-)

Anyway I'm not sure this would give us the kind of environment we need
to do benchmarks ... but it's worth to think of.

  
 2) Not sure how open this might be for the developers (if they
could issue their own builds etc.).
  
 I haven't done it, but I understand that you can create a local
 buildfarm instance which isn't reporting its results.  Again,
 something similar might be good.

Well, yeah. So the developers would get a local 'copy' of all the
benchmarks / workloads and could run them?

 3) If this should be part of the current buildfarm, then I'm
afraid I can't do much about it.
  
 Not part of the current buildfarm; just using a similar overall
 pattern.  Others may have different ideas; I'm just speaking for
 myself here about what seems like a good idea to me.

OK, got it.

 2) How would you use it? What procedure would you expect?
  
 People who had suitable test environments could sign up to
 periodically build and performance test using the predetermined
 test suite, and report results back for a consolidated status
 display. That would spot regressions.

 So it would be a 'distributed farm'? Not sure it that's a good
 idea, as to get reliable benchmark results you need a proper
 environment (not influenced by other jobs, changes of hw etc.).
  
 Yeah, accurate benchmarking is not easy.  We would have to make sure
 people understood that the machine should be dedicated to the
 benchmark while it is running, which is not a requirement for the
 buildfarm.  Maybe provide some way to annotate HW or OS changes?
 So if one machine goes to a new kernel and performance changes
 radically, but other machines which didn't change their kernel
 continue on a level graph, we'd know to suspect the kernel rather
 than some change in PostgreSQL code.

I guess we could run a script that collects all those important
parameters and then detect changes. Anyway we still need some 'really
stable' machines that are not changed at all, to get a long-term baseline.

But I guess that could be done by running some dedicated machines ourselves.

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] performance-test farm

2011-05-11 Thread Andrew Dunstan




On 05/11/2011 06:21 PM, Kevin Grittner wrote:

Tomas Vondrat...@fuzzy.cz  wrote:

Dne 11.5.2011 23:41, Kevin Grittner napsal(a):

Tomas Vondrat...@fuzzy.cz  wrote:



First up, you guys should be aware that Greg Smith at least is working 
on this. Let's not duplicate effort.






1) Is there something that might serve as a model?


I've been assuming that we would use the PostgreSQL Buildfarm as
a model.

http://buildfarm.postgresql.org/

Yes, I was thinking about that too, but

1) A buildfarm used for regular building / unit testing IMHO may
not be the right place to do performance testing (not sure how
isolated the benchmarks can be etc.).


I'm not saying that we should use the existing buildfarm, or expect
current buildfarm machines to support this; just that the pattern of
people volunteering hardware in a similar way would be good.



Some buildfarm members might well be suitable for it.

I recently added support for running optional steps, and made the SCM 
module totally generic. Soon I'm hoping to provide for more radical 
extensibility by having addon modules, which will register themselves 
with the framework and the  have their tests run. I'm currently working 
on an API for such modules. This was inspired by Mike Fowler's work on a 
module to test JDBC builds, which his buildfarm member is currently 
doing: See 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=piapiacdt=2011-05-11%2000%3A00%3A02 
for example. Obvious candidate modules might be other client libraries 
(e.g. perl DBD::Pg), non-committed patches, non-standard tests, and 
performance testing.



2) Not sure how open this might be for the developers (if they
could issue their own builds etc.).


I haven't done it, but I understand that you can create a local
buildfarm instance which isn't reporting its results.  Again,
something similar might be good.



You can certainly create a client that doesn't report its results (just 
run it in --test mode). And you can create your own private server 
(that's been done by at least two organizations I know of).


But to test your own stuff, what we really need is a module to run 
non-committed patches, I think (see above).


There buildfarm client does have a mode (--from-source) that lets you 
test your own stuff and doesn't report on it if you do, but I don't see 
that it would be useful here.





3) If this should be part of the current buildfarm, then I'm
afraid I can't do much about it.




Sure you can. Contribute to the efforts mentioned above.



Not part of the current buildfarm; just using a similar overall
pattern.  Others may have different ideas; I'm just speaking for
myself here about what seems like a good idea to me.



The buildfarm server is a pretty generic reporting framework. Sure we 
can build another. But it seems a bit redundant.





2) How would you use it? What procedure would you expect?


People who had suitable test environments could sign up to
periodically build and performance test using the predetermined
test suite, and report results back for a consolidated status
display. That would spot regressions.

So it would be a 'distributed farm'? Not sure it that's a good
idea, as to get reliable benchmark results you need a proper
environment (not influenced by other jobs, changes of hw etc.).



You are not going to get a useful performance farm except in a 
distributed way. We don't own any labs, nor have we any way of 
assembling the dozens or hundreds of machines to represent the spectrum 
of platforms that we want tested in one spot. Knowing that we have 
suddenly caused a performance regression on, say, FreeBSD 8.1 running on 
AMD64, is a critical requirement.




Yeah, accurate benchmarking is not easy.  We would have to make sure
people understood that the machine should be dedicated to the
benchmark while it is running, which is not a requirement for the
buildfarm.  Maybe provide some way to annotate HW or OS changes?
So if one machine goes to a new kernel and performance changes
radically, but other machines which didn't change their kernel
continue on a level graph, we'd know to suspect the kernel rather
than some change in PostgreSQL code.




Indeed, there are lots of moving pieces.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] XML with invalid chars

2011-05-11 Thread Noah Misch

On Wed, May 11, 2011 at 06:17:07PM -0400, Andrew Dunstan wrote:
 On 05/09/2011 11:25 PM, Noah Misch wrote:
 SELECT xmlcomment(E'\ufffe');

 That's a bit harder. Do we want to extend these checks to cover  
 surrogates and end of plane characters, which are the remaining  
 forbidden chars? It certainly seems likely to be a somewhat slower test  
 since I think we'd need to process the input strings a Unicode char at a  
 time, but we do that in other places and it seems acceptable. What do  
 people think?

My thinking was that we should only make this flag day for xml-type users if
we're going to fix it all the way.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Help: regarding patch development

2011-05-11 Thread nil nil

Hello,
  Sir, i want to develop a service for postgresql related to querry 
processing.but i dont know how to develop it. plz guide me so that i can take 
step.
i will be realy thankful to you.
 
Regards
Emman

Re: [HACKERS] XML with invalid chars

2011-05-11 Thread Andrew Dunstan




On 05/11/2011 07:00 PM, Noah Misch wrote:

On Wed, May 11, 2011 at 06:17:07PM -0400, Andrew Dunstan wrote:

On 05/09/2011 11:25 PM, Noah Misch wrote:

SELECT xmlcomment(E'\ufffe');

That's a bit harder. Do we want to extend these checks to cover
surrogates and end of plane characters, which are the remaining
forbidden chars? It certainly seems likely to be a somewhat slower test
since I think we'd need to process the input strings a Unicode char at a
time, but we do that in other places and it seems acceptable. What do
people think?

My thinking was that we should only make this flag day for xml-type users if
we're going to fix it all the way.


Fair enough.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] performance-test farm

2011-05-11 Thread Stephen Frost

* Andrew Dunstan (and...@dunslane.net) wrote:
 First up, you guys should be aware that Greg Smith at least is
 working on this. Let's not duplicate effort.

Indeed.  I'm also interested in making this happen and have worked with
Greg in the past on it.  There's even some code out there that we
developed to add it on to the buildfarm, though that needs to be
reworked to fit with Andrew's latest changes (which are all good
changes).

We need a bit of hardware, but more, we need someone to clean up the
code, get it all integrated, and make it all work and report useful
information.  My feeling is if you build it, they will come with
regard to the hardware/performance machines.

Thanks,

Stephen


signature.asc
Description: Digital signature

1 2 >

1 - 100 of 106 matches

Mail list logo