Re: [GENERAL] Equivalent for AUTOINCREMENT?

2008-10-31 Thread D. Dante Lorenso

Michelle Konzack wrote:

I have a table where I have a serialnumber which shuld be  increased  be
each INSERT.  I know I can use max() to get the highest number, but  how
can I use it in a INSERT statement?

There was a message for some month a message describing it on this  list
but I do not find the message anymore...


You want to use 'SERIAL' or 'BIGSERIAL' data type.  Then, for your 
insert statement, use the 'RETURNING' clause like this:


CREATE TABLE mytable (
   mypkey SERIAL NOT NULL,
   somecol VARCHAR(200)
);

INSERT INTO mytable (somecol)
VALUES ('this is a test)
RETURNING mypkey;

You'll get your insert statement to return the integer value generated 
by the SERIAL sequence.


These features are way more powerful than AUTO_INCREMENT.

-- Dante




Thanks, Greetings and nice Day/Evening
Michelle Konzack
Systemadministrator
24V Electronic Engineer
Tamay Dogan Network
Debian GNU/Linux Consultant





--
--
D. Dante Lorenso
[EMAIL PROTECTED]

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Numbering rows

2008-10-15 Thread D. Dante Lorenso

Mark Morgan Lloyd wrote:
Is there an easy way to assign a sequential number, possibly based on an 
arbitrary minimum (typically 0 or 1) to each row of an ordered result 
set, or do I have to work with explicit sequences?


I need to do quite a lot of maths on successive rows, extracting numeric 
and timestamp differences hence rates of change. I've typically been 
doing it manually or in a spreadsheet but there has to be a better way 
e.g. by a join on offset row numbers.


PERL can remember variables in your session.  Here's a function I wrote 
that sets a global variable in PL/PERL:


--
CREATE OR REPLACE FUNCTION public.global_var_set
  (in_key varchar, in_value bigint)
RETURNS bigint AS
$body$
my ($key, $value) = @_;
$_SHARED{$key} = $value;
return $value;
$body$
LANGUAGE 'plperl' VOLATILE RETURNS NULL ON NULL INPUT;
--

Then, later you can read that global variable with another function like 
this:


--
CREATE OR REPLACE FUNCTION public.global_var_get
(in_key varchar)
RETURNS bigint AS
$body$
my ($key) = @_;
return $_SHARED{$key} ? $_SHARED{$key} : 0;
$body$
LANGUAGE 'plperl' VOLATILE RETURNS NULL ON NULL INPUT;
--

Perhaps you can use PL/PERL and a function like these to modify global 
variables that you can increment as you do your select.  Something like:


   SELECT global_var_set(0);

   SELECT global_var_inc() AS row_counter, *
   FROM datatable
   ORDER BY whatever;

Just an idea.

-- Dante

--
D. Dante Lorenso
[EMAIL PROTECTED]

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Full outer join? Cross product? How to blend two queries into single row?

2008-09-04 Thread D. Dante Lorenso

All,

I want to do something simple and the terminology is slipping me.  I 
want to execute two separate queries that should return 0 or 1 rows and 
I want to join the results of those queries into a single row.


  SELECT a.col1, a.col2
  FROM mytable a
  WHERE a.uid = 'abc';

  SELECT b.col3, b.col4
  FROM mytable b
  WHERE b.uid = 'def';

But somehow get a result like this:

  SELECT a.col1, a.col2, b.col3, b.col4
  FROM mytable a, mytable b
  WHERE a.uid = 'abc'
  AND b.uid = 'def';

That query works when both a.uid and b.uid match but I want to get 
results even when a.uid matches but b.uid does NOT match and vice versa. 
 Just make a.col1 and a.col2 NULL when a.uid does not match or make 
b.col3 and b.col4 NULL when b.uid does not match.  All 4 can be NULL or 
no rows returned if no matches.


I came up with this query that works, but seems overly complicated:

  SELECT a.col1, a.col2, b.col3, b.col4
  FROM
 (SELECT col1, col3, TRUE AS join_column
  FROM mytable
  WHERE uid = 'abc') a
FULL OUTER JOIN
 (SELECT col3, col4, TRUE AS join_column
  FROM mytable
  WHERE uid = 'def') b
ON (a.join_column = b.join_column);

Is this how to do it, or is there a simpler syntax I'm missing?  What's 
the formal terminology for what I'm doing here?  Is there a way to 
specify a FULL OUTER JOIN without naming the join column since I don't 
need one?


-- Dante

--
D. Dante Lorenso
[EMAIL PROTECTED]

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Foreign Key normalization question

2008-09-02 Thread D. Dante Lorenso

Matthew Wilson wrote:

On Tue 02 Sep 2008 04:40:55 PM EDT, Scott Marlowe wrote:

On Tue, Sep 2, 2008 at 2:35 PM, Matthew Wilson [EMAIL PROTECTED] wrote:

On Tue 02 Sep 2008 04:19:41 PM EDT, Scott Marlowe wrote:

If the two subordinate tables ALWAYS have to point to the same place,
why two tables?  Can't a customer have  1 location?  I'm pretty sure
IBM has more than one corporate office you could ship things to.

Yeah, so the idea is one customer might have many locations and many
products.  And at each location, some subset of all their products is
available.

You could have the product_locations have a custid1 and custid2 fields
that reference the two parent tables, and then a check constraing on
product_locations that custid1=custid2


You inspired me to change my tables to this:

create table location (
id serial unique,
name text,
customer_id int references customer,
primary key (id, customer_id)
);

create table product (
id serial unique,
name text,
customer_id int references customer,
primary key (id, customer_id)
);

create table product_location (
product_id int references product (id),
product_customer_id int references customer (id),
location_id int references location (id),
location_customer_id int references customer (id) check product_customer_id 
= location_customer_id,
foreign key (product_id, product_customer_id) references product (id, 
customer_id),
foreign key (location_id, location_customer_id) references location (id, 
customer_id),
);

This seems to work based on my informal testing, but it seems really
byzantine.  I wish I didn't have to explicitly put the customer IDs in
the table.

Is there a better way?


You could add a trigger to your product_location table that just 
double-checked the customers matched or prevents the insert/update.  A 
PL/PGSQL function like this might help:


-- 8  8 --

DECLARE
  is_ok BOOLEAN;
BEGIN
  SELECT p.customer_id = l.customer_id
  INTO is_ok
  FROM product p, location l
  WHERE p.product_id = NEW.product_id
  AND l.location_id = NEW.location_id;

  -- didnt find the product and location ... weird
  IF NOT FOUND THEN
  RETURN NULL;
  END;

  -- product customer matches the location customer
  IF is_ok = TRUE THEN
  RETURN NEW;
  END;

  -- product and location customers did NOT match, reject changes
  RETURN NULL;
END;
-- 8  8 --

Disclaimer: I have no idea if that code works.  I just whipped it up now 
without testing it.  That might do your checks without having to add 
columns to tables you don't want to add.


Good luck.

-- Dante

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] MySQL LAST_INSERT_ID() to Postgres

2008-08-28 Thread D. Dante Lorenso

Scott Marlowe wrote:

On Thu, Aug 28, 2008 at 3:38 PM, Bill [EMAIL PROTECTED] wrote:

I am new to PostgreSQL but it seems to me that lastval() will only work if
the insert does not produce side effects that call nextval(). Consider the
case where a row is inserted into a table that has an after insert trigger
and the after insert trigger inserts a row into another table which has a
serial primary key. In that case I assume that lastval() will  return the
value from the serial column in the second table.


I use returning almost exclusively now.


RETURNING is the best option.  It makes all your INSERT and UPDATE 
statements feel like SELECTs.  It avoids the round-trip back to the 
server just to ask for the unique id generated by the previous statement.


  INSERT INTO mytable (col1, col2)
  VALUES (value1, value2)
  RETURNING col_value_from_seq_that_we_dont_care_about_the_name;

I use RETURNING for all my insert and UPDATE statements now.  Usually 
I'll return the primary key for the table, but sometimes I return a 
column that is created by one of my triggers.  It's awesome to be able 
to do this in one query.


-- Dante


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Easy way to alias all columns in a table by adding a prefix or suffix?

2008-08-04 Thread D. Dante Lorenso

All,

I have 2 tables with a lot of columns with similar names.  I'd like to 
join both tables and include all columns from each without naming 
collisions.  I can use the 'AS' to rename a single column, but is there 
a way to do the rename in bulk by just appending a prefix or suffix to 
the column names from each respective table?


I want to do something like this:

  SELECT a.* AS prefix1_*, b.* AS prefix2_*
  FROM a, b
  WHERE a.id = b.id
  AND a.id = 123;

The result would be to select all columns from a but rename each to 
have prefix1_ appended to the front.


  a.id - prefix1_id
  b.id - prefix2_id
  a.xpos - prefix1_xpos
  b.xpos - prefix2_xpos
  a.ypos - prefix1_ypos
  b.ypos - prefix2_ypos

etc...

Does this request make sense?  Does something like this exist?  I don't 
really NEED to have this, I'm just trying to be lazy.


-- Dante


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Need Tool to sync databases with 8.3.1

2008-06-15 Thread D. Dante Lorenso

All,

I used to rave about how great the EMS PostgreSQL tools were, but lately 
I've been disappointed by the lack of support for their Database 
Comparer 2007 for PostgreSQL.


I need a tool that will compare 2 database schemas and allow me to 
migrate changes from one database to another.  What tools do the rest of 
you use to accomplish this task?


As the name suggests, Database Comparer 2007 for PostgreSQL seems to 
have stopped development some time back in 2007 and it will not work 
with PostgreSQL 8.3+.  In addition, several of the bugs I encounter with 
8.2.4 databases make it a show stopper there too.


I don't know how else to best accomplish my goals other than dumping 2 
schemas, running a diff and manually syncing the 2 until the diffs go away.


Suggestions for alternatives?  I don't mind spending money if it'll get 
the job done.


-- Dante

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Need Tool to sync databases with 8.3.1

2008-06-14 Thread D. Dante Lorenso

All,

I used to rave about how great the EMS PostgreSQL tools were, but lately 
I've been disappointed by the lack of support for their Database 
Comparer 2007 for PostgreSQL.


I need a tool that will compare 2 database schemas and allow me to 
migrate changes from one database to another.  What tools do the rest of 
you use to accomplish this task?


As the name suggests, Database Comparer 2007 for PostgreSQL seems to 
have stopped development some time back in 2007 and it will not work 
with PostgreSQL 8.3+.  In addition, several of the bugs I encounter with 
8.2.4 databases make it a show stopper there too.


I don't know how else to best accomplish my goals other than dumping 2 
schemas, running a diff and manually syncing the 2 until the diffs go away.


Suggestions for alternatives?  I don't mind spending money if it'll get 
the job done.


-- Dante

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL executing my function too many times during query

2008-05-16 Thread D. Dante Lorenso

Tom Lane wrote:

D. Dante Lorenso [EMAIL PROTECTED] writes:

So, that not being the problem, any ideas?  Is it an 8.2.4 thing?


I can't reproduce any such problem in 8.2 branch tip, and a desultory
scan of the CVS history back to 8.2.4 doesn't turn up any obviously
related patches.  Please provide a self-contained test case for what
you're seeing.


I think this is a problem with the BYTEA type.  I've created a new 
database and reproduced the problem rather easily.  I've run this test 
on both 8.2.4 and 8.3.1.  Here is my test:


-- 8  8 --
 createdb -U dante functest
 createlang -U dante -d functest plpgsql
 psql -U dante functest

##
## create simple table ... most important is the bytea column
##

CREATE TABLE public.demo (
  rec_num SERIAL,
  data_enc_col BYTEA NOT NULL,
  CONSTRAINT demo_pkey PRIMARY KEY(rec_num)
) WITHOUT OIDS;

##
## we need a simple function that will raise a notice on execution
##

CREATE OR REPLACE FUNCTION public.data_enc (in_text text) RETURNS 
bytea AS

$body$
DECLARE
  my_value BYTEA;
BEGIN
  -- decode text into BYTEA type
  SELECT DECODE(in_text, 'escape')
  INTO my_value;

  -- log that we are called
  RAISE NOTICE 'func data_enc called: %', in_text;

  -- done
  return my_value;
END;
$body$
LANGUAGE 'plpgsql' STABLE RETURNS NULL ON NULL INPUT SECURITY INVOKER;

##
## insert 5 sample values that are all the same
##

functest=# INSERT INTO demo(data_enc_col) VALUES (data_enc('dante'));
NOTICE:  func data_enc called: dante
INSERT 0 1
functest=# INSERT INTO demo(data_enc_col) VALUES (data_enc('dante'));
NOTICE:  func data_enc called: dante
INSERT 0 1
functest=# INSERT INTO demo(data_enc_col) VALUES (data_enc('dante'));
NOTICE:  func data_enc called: dante
INSERT 0 1
functest=# INSERT INTO demo(data_enc_col) VALUES (data_enc('dante'));
NOTICE:  func data_enc called: dante
INSERT 0 1
functest=# INSERT INTO demo(data_enc_col) VALUES (data_enc('dante'));
NOTICE:  func data_enc called: dante
INSERT 0 1

##
## now show that the function runs more than once despite being STABLE
##

functest=# SELECT * FROM demo WHERE data_enc_col = data_enc('dante');
NOTICE:  func data_enc called: dante
NOTICE:  func data_enc called: dante
NOTICE:  func data_enc called: dante
NOTICE:  func data_enc called: dante
NOTICE:  func data_enc called: dante
NOTICE:  func data_enc called: dante
 rec_num | data_enc_col
-+--
   1 | dante
   2 | dante
   3 | dante
   4 | dante
   5 | dante
(5 rows)

##
## test the query again but this time, use a subselect as a fix
##

functest=# SELECT * FROM demo WHERE data_enc_col IN (SELECT 
data_enc('dante'));

NOTICE:  func data_enc called: dante
 rec_num | data_enc_col
-+--
   1 | dante
   2 | dante
   3 | dante
   4 | dante
   5 | dante
(5 rows)

-- 8  8 --

What you want to see is how the NOTICE is generated 6 times in the first 
select but only 1 time in the second select (using the subselect 
syntax).  This function has been defined as STABLE and IMMUTABLE and 
neither seem to help.  I've tested this bug on 8.2.4 and 8.3.1.


Is this a bug, or do I need to improve my understanding of how this is 
supposed to work?


-- Dante




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] PostgreSQL executing my function too many times during query

2008-05-15 Thread D. Dante Lorenso
I have several records in my database which have encrypted fields.  I 
want to find all the ones that match a certain format but do NOT match 
another.


My problem is that the 'cc_encrypt' function is being executed for every 
matching row in the table instead of just once.  The function was 
defined as STABLE and I tried IMMUTABLE as well.  That doesn't seem to 
be helping.


This format causes the function to execute too many times:

  SELECT COUNT(*) AS result
  FROM credit_card
  WHERE card_number_enc = cc_encrypt('4111---', 'pwd')
  AND card_number_enc != cc_encrypt('4111', 'pwd');

So, the second cc_encrypt is being executed for every row matching the 
first cc_encrypt condition.  My expectation is that both functions would 
be executed ONCE the result would be used in the query like this:


  SELECT COUNT(*) AS result
  FROM credit_card
  WHERE card_number_enc = RESULT
  AND card_number_enc != RESULT;

To fix the bug, I can rewrite my query like this and the functions 
will only be executed once each as expected:


  SELECT COUNT(*) AS result
  FROM credit_card
  WHERE card_number_enc = cc_encrypt('4111---', 'pwd')
  AND card_number_enc NOT IN (
SELECT cc_encrypt('4111', 'pwd')
  );

I don't understand what's happening here.  Any help?  Maybe the EXPLAIN 
tells something?


# EXPLAIN SELECT COUNT(*) AS result
# FROM credit_card
# WHERE card_number_enc = cc_encrypt('4111---', 'pwd')
# AND card_number_enc != cc_encrypt('4111', 'pwd');
QUERY PLAN 


--
 Aggregate  (cost=196.96..196.97 rows=1 width=0)
   -  Bitmap Heap Scan on credit_card  (cost=4.87..196.76 rows=79 width=0)
 Recheck Cond: (card_number_enc = 
credit_card_encrypt('4111---'::text, 'password'::text))
 Filter: (card_number_enc  
credit_card_encrypt('4111'::text, 'password'::text))
 -  Bitmap Index Scan on credit_card_idx_card_number_enc 
(cost=0.00..4.85 rows=79 width=0)
   Index Cond: (card_number_enc = 
credit_card_encrypt('4111---'::text, 'password'::text))

(6 rows)

Oddly, when I use 'EXPLAIN', I see my debug logging RAISE NOTICE 
statements showing that the function was only executed once each.  When 
I don't use EXPLAIN, it's back to showing that the second function was 
executed for each matching record of the first.


# SELECT version();
 version 


--
 PostgreSQL 8.2.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 
4.1.1 20070105 (Red Hat 4.1.1-52)

(1 row)

-- Dante

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL executing my function too many times during query

2008-05-15 Thread D. Dante Lorenso

Tom Lane wrote:

D. Dante Lorenso [EMAIL PROTECTED] writes:

This format causes the function to execute too many times:



   SELECT COUNT(*) AS result
   FROM credit_card
   WHERE card_number_enc = cc_encrypt('4111---', 'pwd')
   AND card_number_enc != cc_encrypt('4111', 'pwd');


If the function is marked immutable that query should certainly evaluate
it only twice.

I notice that the EXPLAINs show it as credit_card_encrypt() not
cc_encrypt() --- maybe you got confused about which function you
were adjusting the properties of?


Sorry I was trying to shorten the function name to help with email 
wrapping.  It was credit_card_encrypt.  I didn't do reps in all places.


So, that not being the problem, any ideas?  Is it an 8.2.4 thing?

-- Dante

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Need timestamp function that will change within a transaction

2008-05-14 Thread D. Dante Lorenso

I'd like to find a timestamp function that WILL change within a transaction.

This function will return to me a 15 digit BIGINT number in base10:

  SELECT FLOOR(EXTRACT(EPOCH FROM NOW()) * 10)::bigint;

The problem is that NOW() does not change within a transaction and so I 
keep getting the same value.  Is there a function that will return a 
timestamp that is NOT consistent within a transaction?


-- Dante

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] convert very large unsigned numbers to base62?

2008-05-14 Thread D. Dante Lorenso
I'd like to convert very large unsigned numbers (ala bigint) to a text 
string using base62.  I created this PL/PERL function to do the trick:



CREATE OR REPLACE FUNCTION public.ls_crypt_convert_base (in_value 
text, in_base integer) RETURNS text AS

$body$
  my ($value, $base) = @_;
  $base = ($base  62) ? 62 : (($base  2) ? 2 : $base);
  my @nums = (0..9,'a'..'z','A'..'Z')[0..$base-1];
  my $index = 0;
  my %nums = map {$_, $index++} @nums;

  # short circuit if no value
  $value =~ s/\D//g;
  return if ($value == 0);

  # this will be the end value.
  my $rep = '';
  while ($value  0) {
  $rep = $nums[$value % $base] . $rep;
  $value = int($value / $base);
  }
  return $rep;
$body$
LANGUAGE 'plperl' IMMUTABLE RETURNS NULL ON NULL INPUT SECURITY INVOKER;


  # SELECT ls_crypt_convert_base(99::text, 62);
   ls_crypt_convert_base
  ---
   1bS0EMtBbK8
  (1 row)

  # SELECT ls_crypt_convert_base(88::text, 62);
   ls_crypt_convert_base
  ---
   13F7tmqjhmu
  (1 row)

  # SELECT ls_crypt_convert_base(77::text, 62);
   ls_crypt_convert_base
  ---
   8umLiF
  (1 row)

  # SELECT ls_crypt_convert_base(123456789::text, 62);
   ls_crypt_convert_base
  ---
   8m0Kx
  (1 row)


Did I just reinvent the wheel?  It seems like something like this is 
should already be built into PostgreSQL and I just don't know where to look.


-- Dante

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] convert very large unsigned numbers to base62?

2008-05-14 Thread D. Dante Lorenso

Stuart Cooper wrote:

I'd like to convert very large unsigned numbers (ala bigint) to a text
string using base62.  I created this PL/PERL function to do the trick:


base 62 is cruel and unusual punishment. Introduce two more printing
characters to your set a..z, A..Z, 0..9 such as _ and ! and do it in base 64
instead. There's probably plenty of standard tools and convertors to do
things in base 64.


I thought about adding 2 more characters, but I didn't like anything 
that was on my keyboard ;-)  In English, we use 0-9a-zA-Z commonly but 
to me, - and _ look odd at the beginning or end of a string or when 
repeated more than once.


Ugly code:

  AR---_

OK code:

  ARzd1A0b3P

In some cases, I may even want to eliminate characters that look similar 
like 1 and l or O and 0.  Better yet, if the code that comes out 
of the conversion contains vowels, its possible to look like profanity:


  PzbigAss22

So, perhaps a better character set would not include vowels either:

  0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

is reduced to:

  23456789bcdfghjkmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ

  # SELECT LENGTH('23456789bcdfghjkmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ');
   length
  
   49
  (1 row)

This looks like a job for Base49 conversion ;-)

The code I attached can do anywhere from base2 through base62 without 
problems but uses a hard-coded replacement set of characters.  This 
could be modified to accept another parameter of replacement chars.


My original question is more along the lines of trying to see if there 
were built-in functions in PostgreSQL that already do this type of base 
conversion.  As a built-in, my expectation would be that it would likely 
be faster and supported ... without me having to introduce a PL/PERL 
dependent custom function.  Worst case, I could do the base conversion 
using the default character mappings and just remap the output to the 
alternate characters.


  0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLM
  23456789bcdfghjkmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ

  0 = 2, 1 = 3. ... r = z, ... M = Z

Base conversion seems like a common task for most programming languages. 
 I didn't know where to look and it wasn't coming up in my searches.


-- Dante



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Returning RECORD from PGSQL without custom type?

2008-05-12 Thread D. Dante Lorenso

Instead of doing this:

  CREATE OR REPLACE FUNCTION my_custom_func (in_value bigint)
  RETURNS SETOF record AS
  $body$
  ...
  $body$
  LANGUAGE 'plpgsql' VOLATILE;

I'd like to be able to do this:

  CREATE OR REPLACE FUNCTION my_custom_func (in_value bigint)
  RETURNS SETOF (col1name BIGINT, col2name TEXT, ...) AS
  $body$
  ...
  $body$
  LANGUAGE 'plpgsql' VOLATILE;

Because this is the only function that will be returning that TYPE and I 
don't want to have to create a separate type definition just for the 
return results of this function.


Maybe even more cool would be if the OUT record was already defined so 
that I could simply select into that record to send our new rows:


   RETURN NEXT OUT;

   OUT.col1name := 12345;
   RETURN NEXT OUT;

   SELECT 12345, 'sample'
   INTO OUT.col1name, OUT.col2name;
   RETURN NEXT OUT;

Just as you've allowed me to define the IN variable names without 
needing the legacy 'ALIAS $1 ...' format, I'd like to name the returned 
record column names and types in a simple declaration like I show above.


Does this feature request make sense to everyone?  It would make 
programming set returning record functions a lot easier.


-- Dante


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Returning RECORD from PGSQL without custom type?

2008-05-12 Thread D. Dante Lorenso

Pavel Stehule wrote:

Hello
2008/5/10 D. Dante Lorenso [EMAIL PROTECTED]:

Instead of doing this:

 CREATE OR REPLACE FUNCTION my_custom_func (in_value bigint)
 RETURNS SETOF record AS
 $body$
 ...
 $body$
 LANGUAGE 'plpgsql' VOLATILE;

I'd like to be able to do this:

 CREATE OR REPLACE FUNCTION my_custom_func (in_value bigint)
 RETURNS SETOF (col1name BIGINT, col2name TEXT, ...) AS
 $body$
 ...
 $body$
 LANGUAGE 'plpgsql' VOLATILE;



Standard syntax via ANSI SQL is
CREATE FUNCTION foo(params)
RETURNS TABLE(fields of output table) AS
$$ ...
$$


Ah, this sound almost exactly like what I'm wanting!  So ... you are 
saying that developers are working on something like?  I'm running 8.3 
... would I find this feature in 8.4 or is it still not included in any 
release?



Because this is the only function that will be returning that TYPE and I
don't want to have to create a separate type definition just for the return
results of this function.

Maybe even more cool would be if the OUT record was already defined so that
I could simply select into that record to send our new rows:
  RETURN NEXT OUT;
  OUT.col1name := 12345;
  RETURN NEXT OUT;
  SELECT 12345, 'sample'
  INTO OUT.col1name, OUT.col2name;
  RETURN NEXT OUT;


it's good idea - it was probably main problem of last patch in
plpgsql. In this syntax is clear what is output, so RETURN NEXT
statement can be without params. I am only not sure about name of
default variable - maybe result is better.


Yeah, RESULT works too.  I'm not particular about what it has to be ... 
just that something like that might exist.


Where can I go to follow development of this or test it out?  I see some 
old threads now that I know what to look for:


http://archives.postgresql.org/pgsql-hackers/2007-02/msg00318.php
http://archives.postgresql.org/pgsql-patches/2007-02/msg00216.php
http://archives.postgresql.org/pgsql-patches/2007-02/msg00341.php
http://archives.postgresql.org/pgsql-hackers/2007-09/msg01079.php

I want to make sure this patch/proposal covers my needs and 
expectations.  Specifically I want to return records that are not simple 
a straight query:


  CREATE OR REPLACE FUNCTION foo(f integer)
  RETURNS TABLE(a int, b int) AS
  $$
  DECLARE
my_a INT;
my_b INT;
  BEGIN
-- 1) perhaps like this
SELECT 1, 2
INTO RESULT.a, RESULT.b;
RETURN NEXT RESULT;

-- 2) maybe like this
RETURN NEXT 3, 4;  -- a=3, b=4

-- 3) how about like this
my_a := 5;
my_b := 6;
RETURN NEXT my_a, my_b;

-- 4) maybe like this
RETURN NEXT QUERY SELECT a, b FROM sometable x WHERE x.f = f;

-- done
RETURN;
  END;
  $$ LANGUAGE plpgsql;

Usage:

  SELECT a, b
  FROM foo(20);

Results:

  a  |  b
  ---+
   1 |  2   -- 1)
   3 |  4   -- 2)
   5 |  6   -- 3)
  ...   -- 4) results from sometable WHERE x.f = 20

What do you think, will I be able to do all of this?

-- Dante


Regards
Pavel Stehule


Just as you've allowed me to define the IN variable names without needing
the legacy 'ALIAS $1 ...' format, I'd like to name the returned record
column names and types in a simple declaration like I show above.

Does this feature request make sense to everyone?  It would make programming
set returning record functions a lot easier.

-- Dante


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Returning RECORD from PGSQL without custom type?

2008-05-12 Thread D. Dante Lorenso

Tom Lane wrote:

D. Dante Lorenso [EMAIL PROTECTED] writes:

I'd like to be able to do this:

CREATE OR REPLACE FUNCTION my_custom_func (in_value bigint)
RETURNS SETOF (col1name BIGINT, col2name TEXT, ...) AS


You realize of course that you can do this *today* if you use OUT
parameters?


No, I didn't realize.  I always assumed OUT parameters were like return 
values from a function ... like:


  (out1, out2, out3) = somefunction (in1, in2, in3);

I never realized you could return a SETOF those OUT parameters.  I guess 
it wasn't intuitive, but I'm learning this now.


I think all the functionality I want DOES already exist.  Let me go work 
with it.  Thanks.


-- Dante






CREATE OR REPLACE FUNCTION my_custom_func (in_value bigint,
  OUT col1name BIGINT, OUT col2name TEXT, ...)
  RETURNS SETOF RECORD AS

The TABLE syntax is a bit more standards-compliant maybe, but it's not
offering any actual new functionality.

regards, tom lane




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Returning RECORD from PGSQL without custom type?

2008-05-10 Thread D. Dante Lorenso

Instead of doing this:

  CREATE OR REPLACE FUNCTION my_custom_func (in_value bigint)
  RETURNS SETOF record AS
  $body$
  ...
  $body$
  LANGUAGE 'plpgsql' VOLATILE;

I'd like to be able to do this:

  CREATE OR REPLACE FUNCTION my_custom_func (in_value bigint)
  RETURNS SETOF (col1name BIGINT, col2name TEXT, ...) AS
  $body$
  ...
  $body$
  LANGUAGE 'plpgsql' VOLATILE;

Because this is the only function that will be returning that TYPE and I 
don't want to have to create a separate type definition just for the 
return results of this function.


Maybe even more cool would be if the OUT record was already defined so 
that I could simply select into that record to send our new rows:


   RETURN NEXT OUT;

   OUT.col1name := 12345;
   RETURN NEXT OUT;

   SELECT 12345, 'sample'
   INTO OUT.col1name, OUT.col2name;
   RETURN NEXT OUT;

Just as you've allowed me to define the IN variable names without 
needing the legacy 'ALIAS $1 ...' format, I'd like to name the returned 
record column names and types in a simple declaration like I show above.


Does this feature request make sense to everyone?  It would make 
programming set returning record functions a lot easier.


-- Dante

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Sync some database tables, but not others ... WAS Re: [GENERAL] How to modify ENUM datatypes?

2008-04-25 Thread D. Dante Lorenso

Tino Wildenhain wrote:

D. Dante Lorenso wrote:

Alvaro Herrera wrote:

D. Dante Lorenso wrote:
Or, here's another way to look at it ... make it easier to modify 
ENUM  datatypes because we all know that you will eventually need 
that feature  whether you males, females, and unknowns think so or not.

Agreed.  Let's keep in mind that the current ENUM implementation is
completely new.


Here's a use case that I doubt too many are aware of ... what's unique 
about ENUM is that the data for the ENUM becomes part of the schema of 
a database.


This is actually something I like very much.  I have systems where the 
DEV and PROD databases are separate and must be synchronized when 
pushing out new features.  I currently use non-free tools to 
synchronize just the schemas for both databases.


Often if a new row is added to a lookup table, that row doesn't make 
it to the QA or PROD databases because it's part of the data of a 
table and is not part of the schema.  For data (like ENUM data) that 
should be consistent across databases, it helps if it gets deployed 
with the schema so that lookups will succeed properly.


Well since its configuration and not payload its nothing wrong with just
having the data in your repository as well and load it every time when
you roll out a new release.


I have a convenient 3rd party tool that will sync this database schema 
with that database schema.  I just run the tool, accept the discovered 
changes and voila, I've deployed the database changes to the next 
environment.


I haven't written any custom scripts to import content into specific 
tables.  As I see it, that would be a little complex also since you'd 
have to find the difference between dev and prod and only push the 
changes across (additions or deletes).  For potentially hundreds of 
small ENUM-like lookup tables, this seems quite tedious ... like the 
kind of thing a DB sync tool should handle for you ;-)


Perhaps there is a 3rd party tool that would not only sync the DB 
schema, but could add a list of tables which must also have their data 
synced?  Something that could remember that these 50 tables are 
constant/deployable and must be mirrored as-is to the other database 
while these other tables store environment-specific data and should not 
be synced.  Anyone know of such a tool?


-- Dante




Cheers
Tino




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to modify ENUM datatypes?

2008-04-24 Thread D. Dante Lorenso

Andrew Sullivan wrote:

On Wed, Apr 23, 2008 at 07:46:07PM -0400, brian wrote:
Absolutely true. Which is odd, because this example is trotted out 
whenever there's a thread about ENUMs.

I don't think it's odd at all.  In my view, the people who think enums are a
good datatype for databases are exactly the sorts who'd think that their
data is as static as this poor understanding of the vagaries of individuals'
sex (gender is a different problem, given its association with social roles)
would suggest.


Or, here's another way to look at it ... make it easier to modify ENUM 
datatypes because we all know that you will eventually need that feature 
whether you males, females, and unknowns think so or not.


-- Dante

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to modify ENUM datatypes?

2008-04-24 Thread D. Dante Lorenso

Alvaro Herrera wrote:

D. Dante Lorenso wrote:
Or, here's another way to look at it ... make it easier to modify ENUM  
datatypes because we all know that you will eventually need that feature  
whether you males, females, and unknowns think so or not.

Agreed.  Let's keep in mind that the current ENUM implementation is
completely new.


Here's a use case that I doubt too many are aware of ... what's unique 
about ENUM is that the data for the ENUM becomes part of the schema of a 
database.


This is actually something I like very much.  I have systems where the 
DEV and PROD databases are separate and must be synchronized when 
pushing out new features.  I currently use non-free tools to synchronize 
just the schemas for both databases.


Often if a new row is added to a lookup table, that row doesn't make it 
to the QA or PROD databases because it's part of the data of a table and 
is not part of the schema.  For data (like ENUM data) that should be 
consistent across databases, it helps if it gets deployed with the 
schema so that lookups will succeed properly.


-- Dante



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] How to modify ENUM datatypes?

2008-04-22 Thread D. Dante Lorenso

All,

In the past I have used foreign keys to lookup tables for small lists of 
values that I now think ENUM could do the job of.  I was hoping that by 
using ENUM, I would avoid having to do joins in my queries, and that I'd 
be making the data more robust and faster.


I used to have a table for account_status:

  A | Active
  B | Billing Failed
  C | Closed
  D | Deactivated

  account.acct_type CHAR references account_type.acct_type CHAR

But, now I've converted that to an ENUM:

  ACTIVE
  BILLING_FAILED
  CLOSED
  DEACTIVATED

  account.acct_type ENUM account_type

The problem is that once I create a column in my account table that uses 
this 'account_type' datatype, I can't seem to change or add to it any 
more.  I want to add a new value or edit/delete an existing one.


How do you make changes to an ENUM datatype that is already in use?

-- Dante




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to modify ENUM datatypes?

2008-04-22 Thread D. Dante Lorenso

Andreas 'ads' Scherbaum wrote:

On Tue, 22 Apr 2008 20:58:30 +0800 Craig Ringer wrote:

D. Dante Lorenso wrote:
The problem is that once I create a column in my account table that 
uses this 'account_type' datatype, I can't seem to change or add to it 
any more.  I want to add a new value or edit/delete an existing one.

How do you make changes to an ENUM datatype that is already in use?
As far as I know ENUM is not well suited to uses where new enumeration 
members may be added later. A lookup table and a foreign key is probably 
better for this sort of use.

I remember the discussions before PG implemented ENUMs at all - some
people voted against this feature because they knew that questions
about modifing the enum values would pop up sooner or later.
You *can* add elements to an enum type - sort of - by creating a new 
type and converting columns. It's ugly, though, and will be hard to get 
right when the column of interest is referenced by foreign keys and such.

If there's one request to modify a specific ENUM column, other requests
will follow because the enum set is not as static as it seems in the
first place. So i beg that the only advise should be change the column
to a foreign key construct. Converting columns to new data types is
much more overhead anyway.


So, the advice here is don't use ENUM?

I was really hoping that it would be more efficient to not have to do 
all the foreign keys and joins for tables that may have 4-5 enum types.


Just being able to:

  SELECT *
  FROM tablename

would be nice if my columns contained enums instead of doing:

  SELECT *
  FROM tablename, lookuptable
  WHERE tablename.some_id = lookuptable.some_id

Isn't the join more expensive?

-- Dante

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to modify ENUM datatypes?

2008-04-22 Thread D. Dante Lorenso

Scott Marlowe wrote:

On Tue, Apr 22, 2008 at 2:45 PM, D. Dante Lorenso [EMAIL PROTECTED] wrote:

 So, the advice here is don't use ENUM?
 I was really hoping that it would be more efficient to not have to do all
the foreign keys and joins for tables that may have 4-5 enum types.
 Just being able to:
  SELECT *
  FROM tablename

If you use a lookup table methodology you still get that.  Try this:
smarlowe=# create table choices (color text primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
choices_pkey for table choices
CREATE TABLE
smarlowe=# insert into choices values ('red'),('yellow'),('green'),('blue');
INSERT 0 4
smarlowe=# create table mystuff (id serial primary key, usenam text,
mycolor text references choices(color));
NOTICE:  CREATE TABLE will create implicit sequence mystuff_id_seq
for serial column mystuff.id
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
mystuff_pkey for table mystuff
CREATE TABLE
smarlowe=# insert into mystuff (usenam, mycolor) values ('scott','red');
INSERT 0 1
smarlowe=# insert into mystuff (usenam, mycolor) values ('darren','blue');
INSERT 0 1
smarlowe=# insert into mystuff (usenam, mycolor) values ('dan','green');
INSERT 0 1
smarlowe=# insert into mystuff (usenam, mycolor) values ('steve','green');
INSERT 0 1
smarlowe=# insert into mystuff (usenam, mycolor) values ('mike','black');
ERROR:  insert or update on table mystuff violates foreign key
constraint mystuff_mycolor_fkey
DETAIL:  Key (mycolor)=(black) is not present in table choices.
smarlowe=# select * from mystuff;
 id | usenam | mycolor
++-
  1 | scott  | red
  2 | darren | blue
  3 | dan| green
  4 | steve  | green
(4 rows)
tada!  No enum, and no join.  But you can't insert illegal values in mycolor...


This approach is so old-school, I seem to have overlooked the obvious.

Here you've elected to use the foreign key to just control the possible 
values inserted but not really to look up the value.


Seems you are storing the values in text form which goes against all the 
normalization techniques I've learned in school.  I see this might be a 
problem with storage since you will need to store the TEXT value for 
every row in the 'mystuff' table instead of just storing the reference 
to the lookup table as an INTEGER.  Over millions of rows, perhaps this 
would become a concern?


What is the general consensus by the community about this approach?  Is 
this de-normalization frowned upon, or is there a performance advantage 
here that warrants the usage?


-- Dante











--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Cascading Trigger - changing row on delete does not delete row

2008-02-25 Thread D. Dante Lorenso

All,

I have 2 tables which both have triggers on them.  When I delete a row 
on table A, a cascading trigger ends up modifying rows in table B.  The 
modified rows in table B trigger an update on rows in table A which 
happens to be the same row that I am trying to delete.


I don't get any errors from the delete, yet PostgreSQL tells me 0 rows 
affected by the delete and sure enough the row I just tried to delete is 
still there.  Running the delete a 2nd time works because the trigger 
does not cascade and effect the deleted row.


Is there a way to know that a row I am deleting is being deleted so I 
don't update it?


I thought about adding a boolean column 'is_being_deleted' but I can't 
set that to true without updating the row (which I'm trying to avoid).


I've thought about using PL/Perl to access transaction-level global 
variables where I could store the ID of the row I'm deleting and fetch 
that value in order to avoid it in my updates ... but I don't want 
invoke the PL/Perl interpreter and slow down what I'm already doing in 
PL/PGSQL.  Are there transaction-level variables in PL/PGSQL (globals)?


Suggestions?

-- Dante


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] Cascading Trigger - changing row on delete does not delete row

2008-02-24 Thread D. Dante Lorenso

All,

I have 2 tables which both have triggers on them.  When I delete a row 
on table A, a cascading trigger ends up modifying rows in table B.  The 
modified rows in table B trigger an update on rows in table A which 
happens to be the same row that I am trying to delete.


I don't get any errors from the delete, yet PostgreSQL tells me 0 rows 
affected by the delete and sure enough the row I just tried to delete is 
still there.  Running the delete a 2nd time works because the trigger 
does not cascade and effect the deleted row.


Is there a way to know that a row I am deleting is being deleted so I 
don't update it?


I thought about adding a boolean column 'is_being_deleted' but I can't 
set that to true without updating the row (which I'm trying to avoid).


I've thought about using PL/Perl to access transaction-level global 
variables where I could store the ID of the row I'm deleting and fetch 
that value in order to avoid it in my updates ... but I don't want 
invoke the PL/Perl interpreter and slow down what I'm already doing in 
PL/PGSQL.  Are there transaction-level variables in PL/PGSQL (globals)?


Suggestions?

-- Dante

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Cascading Trigger - changing row on delete does not delete row

2008-02-24 Thread D. Dante Lorenso

All,

I have 2 tables which both have triggers on them.  When I delete a row 
on table A, a cascading trigger ends up modifying rows in table B.  The 
modified rows in table B trigger an update on rows in table A which 
happens to be the same row that I am trying to delete.


I don't get any errors from the delete, yet PostgreSQL tells me 0 rows 
affected by the delete and sure enough the row I just tried to delete is 
still there.  Running the delete a 2nd time works because the trigger 
does not cascade and effect the deleted row.


Is there a way to know that a row I am deleting is being deleted so I 
don't update it?


I thought about adding a boolean column 'is_being_deleted' but I can't 
set that to true without updating the row (which I'm trying to avoid).


I've thought about using PL/Perl to access transaction-level global 
variables where I could store the ID of the row I'm deleting and fetch 
that value in order to avoid it in my updates ... but I don't want 
invoke the PL/Perl interpreter and slow down what I'm already doing in 
PL/PGSQL.  Are there transaction-level variables in PL/PGSQL (globals)?


Suggestions?

-- Dante



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[GENERAL] WHERE vs ORDER BY vs LIMIT why not using the correct index?

2008-01-08 Thread D. Dante Lorenso

All,

I have a simple query:

  SELECT tcq_id
  FROM queue q
  WHERE q.status = 'D'
  ORDER BY tcq_id ASC
  LIMIT 1;

What I'm trying to do is find all the items in the queue which have a 
status of 'D' and then select the oldest item first.  My data is very 
skewed such that there are not many records matching the WHERE clause.


  status  my_count
  D  4
  Q  6
  S 20
  P 74
  F   3294
  C 291206

However, when I explain that query, it's doing an index scan on the 
'tcq_id' primary key column instead of using the index on 'status':


--
EXPLAIN SELECT tcq_id
FROM queue q
WHERE q.status = 'D'
ORDER BY tcq_id ASC
LIMIT 1;

 Limit  (cost=0.00..40.09 rows=1 width=8)
   -  Index Scan using queue_pkey on queue q (cost=0.00..59059.80 
rows=1473 width=8)

 Filter: (status = 'D'::bpchar)
--

But then when I take out the ORDER BY clause, PostgreSQL will use a 
different (and proper) index:


--
EXPLAIN SELECT tcq_id
FROM transcode_queue q
WHERE q.status = 'D'
LIMIT 1;

 Limit  (cost=0.00..3.81 rows=1 width=8)
   -  Index Scan using queue_idx_status on queue q (cost=0.00..5618.07 
rows=1473 width=8)

 Index Cond: (status = 'D'::bpchar)
--

I don't understand why the ORDER BY condition would be affecting my 
WHERE criteria.  Shouldn't the ordering be done after the filter is 
first applied?


I'm wanting: find the 4 rows where status = 'D' then order those by 
tcq_id and return the first one.  But postgresql seems to be choosing: 
order all records by the tcq_id then scan them sequentially and find 
the first one matching status = 'D'.


How can I influence the planner's decision while keeping my ORDER BY clause?

After furthing testing, maybe it's not the ORDER BY but the LIMIT that 
is causing the poor planner choice?  I tried to do this:


--
SELECT tcq_id
FROM (
  SELECT tcq_id
  FROM queue q
  WHERE q.status = 'D'
) x
ORDER BY x.tcq_id ASC
LIMIT 1;

 Limit  (cost=0.00..40.09 rows=1 width=8)
   -  Index Scan using queue_pkey on queue q  (cost=0.00..59059.80 
rows=1473 width=8)

 Filter: (status = 'D'::bpchar)
--

But this results in another wrong index choice.  So, I removed the LIMIT 
clause and now it does use the right index:


--
SELECT tcq_id
FROM (
  SELECT tcq_id
  FROM queue q
  WHERE q.status = 'D'
) x
ORDER BY x.tcq_id ASC;

 Sort  (cost=4314.36..4318.05 rows=1473 width=8)
   Sort Key: q.tcq_id
   -  Bitmap Heap Scan on queue q  (cost=35.71..4236.85 rows=1473 width=8)
 Recheck Cond: (status = 'D'::bpchar)
 -  Bitmap Index Scan on queue_idx_status  (cost=0.00..35.34 
rows=1473 width=0)

   Index Cond: (status = 'D'::bpchar)
--

Can someone shed some insight here and help me understand what's going on?

-- Dante


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


[GENERAL] Need efficient way to do comparison with NULL as an option

2008-01-04 Thread D. Dante Lorenso

All,

I'm looking for an operator that will compare NULL with NULL and 
evaluate as TRUE.


I have a BIGINT column which might contain NULL values.  I want to pass 
a value to compare with that column in my WHERE clause.  If the value 
I'm comparing is 0, I want it to match the NULL values.  Here is a 
sample query that I currently use:


  SELECT *
  FROM mytable
  WHERE (col IS NULL AND NULLIF(?, 0) IS NULL) OR col = ?;

The '?' placeholders used in the query will receive the same value which 
might be any one of the following: NULL, 0, 1, 2, 3, etc.


What I'd really like is an operator that will compare NULL with NULL and 
evaluate as TRUE.  Does that exist?


I tried solving this myself, so I have a stored proc like this:


CREATE OR REPLACE FUNCTION public.is_equal_bigint (in_val1 bigint, 
in_val2 bigint) RETURNS boolean AS

$body$
BEGIN
-- both values are null
IF in_val1 IS NULL AND in_val2 IS NULL THEN
   RETURN TRUE;
END IF;

-- values are the same
IF in_val1 = in_val2 THEN
   RETURN TRUE;
END IF;

-- values are different
RETURN FALSE;
END;
$body$
LANGUAGE 'plpgsql' IMMUTABLE CALLED ON NULL INPUT SECURITY INVOKER;


I can use this function as follows:

  SELECT *
  FROM mytable
  WHERE is_equal_bigint(col, NULLIF(?, 0)) IS TRUE;

But I worry that this will not allow my query to use any indexes on 
'col' and will make all my queries slower.


Any recomendations for making my first query above more optimized and 
still efficient?  Does the operator I'm looking for exist?


-- Dante


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Need efficient way to do comparison with NULL as an option

2008-01-04 Thread D. Dante Lorenso

Tom Lane wrote:

D. Dante Lorenso [EMAIL PROTECTED] writes:
I'm looking for an operator that will compare NULL with NULL and 
evaluate as TRUE.

If the value I'm comparing is 0, I want it to match the NULL values.

[ raised eyebrow... ]  Sir, you need to rethink your data
representation.


Tom,

Here's what I'm doing, tell me if I'm crazy:

The column I'm comparing to is 'folder_id'.  The folder_id column is a 
foreign key to a folder table.  If folder_id is NULL, the row is not in 
a folder.


If I want to find all items in a specific folder, I want:

  SELECT *
  FROM mytable
  WHERE folder_id = 123;

But if I want to find all the items which are not in any folder, I want:

  SELECT *
  FROM mytable
  WHERE folder_id IS NULL;

I don't have any folder_id 0, so on a URL I might do this:

  http://xyz/page.php?fid=123
  http://xyz/page.php?fid=0

If folder_id is 0, I do the NULL comparison.

  SELECT *
  FROM mytable
  WHERE folder_id IS NOT DISTINCT FROM NULLIF(?, 0);

That seems to do what I want.  Is it bad design?  Something I'm missing 
about indexing a NULL or something like that?


-- Dante


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Need efficient way to do comparison with NULL as an option

2008-01-04 Thread D. Dante Lorenso

Tom Lane wrote:

D. Dante Lorenso [EMAIL PROTECTED] writes:
I'm looking for an operator that will compare NULL with NULL and 
evaluate as TRUE.

regression=# select null IS NOT DISTINCT FROM 42;
 ?column? 
--

 f
(1 row)
regression=# select null IS NOT DISTINCT FROM null;
 ?column? 
--

 t
(1 row)
However, if you're expecting this to be real efficient (like, use an
index), you're out of luck ...

If the value I'm comparing is 0, I want it to match the NULL values.

[ raised eyebrow... ]  Sir, you need to rethink your data
representation.


Tom,

I don't understand why my index is not being used (other than you said so):

--
  SELECT COUNT(*)
  FROM audio
  WHERE (folder_id = ? AND ? IS NOT NULL)
  OR (folder_id IS NULL AND ? IS NULL);

uses index when ? = 100 (as expected)
does NOT use index when ? = NULL (as expected)

--
  SELECT COUNT(*)
  FROM audio
  WHERE folder_id IS NOT DISTINCT FROM ?;

does NOT use index when ? = NULL (as expected)
does NOT use index when ? = 100 (NOT expected!) -!!!

--

So, although 'IS NOT DISTINCT FROM' is a lot more readable than my other 
form, it's apparently not efficient.  How can I get the efficiency and 
still have the clarity?


-- Dante

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Problem with index when using function

2007-12-31 Thread D. Dante Lorenso

All,

I have a weird situation where my index IS used when I use a query that 
hard-codes a value but it does NOT use the index when the value is 
returned from a PGSQL function:


==
DOES NOT WORK
==

svcprod=# EXPLAIN SELECT
COALESCE(SUM(start_count), 0) AS start_count,
COALESCE(SUM(end_count), 0) AS end_count,
COALESCE(SUM(total_playtime), 0) AS total_playtime
  FROM audio_file_stats
  WHERE afile_id = num2id(1173, 1075, 'audio_file', 'audio_id', 
'afile_id')::bigint;


QUERY PLAN 


--
 Aggregate  (cost=118677.35..118677.36 rows=1 width=19)
   -  Seq Scan on audio_file_stats  (cost=0.00..118675.33 rows=268 
width=19)
 Filter: (afile_id = num2id(1173::bigint, 1075::bigint, 
'audio_file'::character varying, 'audio_id'::character varying, 
'afile_id'::character varying))

(3 rows)

==
WORKS
==

byoaudio=# EXPLAIN SELECT
COALESCE(SUM(start_count), 0) AS start_count,
COALESCE(SUM(end_count), 0) AS end_count,
COALESCE(SUM(total_playtime), 0) AS total_playtime
  FROM audio_file_stats
  WHERE afile_id = 48702;

QUERY PLAN 


--
 Aggregate  (cost=672.69..672.70 rows=1 width=19)
   -  Index Scan using audio_file_stats_idx_afile_id on 
audio_file_stats  (cost=0.00..670.73 rows=261 width=19)

 Index Cond: (afile_id = 48702)
(3 rows)

==

The function I use is defined as using returning a BIGINT which is the 
same datatype as is used by the index:


FUNCTION public.num2id (in_acct_id bigint, in_value bigint,
  in_table_name varchar, in_input_column varchar,
  in_output_column varchar) RETURNS bigint

Can someone help explain what is being done wrong here?  I'm using 8.2.4.

-- Dante


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Problem with index when using function

2007-12-31 Thread D. Dante Lorenso
Thanks if you started to look into this, but nevermind.  I figured it 
out.  Turns out I only needed to mark the function as STABLE:


http://www.postgresql.org/docs/8.0/static/xfunc-volatility.html

A STABLE function cannot modify the database and is guaranteed to 
return the same results given the same arguments for all calls within a 
single surrounding query. This category allows the optimizer to optimize 
away multiple calls of the function within a single query. In 
particular, it is safe to use an expression containing such a function 
in an index scan condition. (Since an index scan will evaluate the 
comparison value only once, not once at each row, it is not valid to use 
a VOLATILE function in an index scan condition.)


-- Dante



D. Dante Lorenso wrote:

All,

I have a weird situation where my index IS used when I use a query that 
hard-codes a value but it does NOT use the index when the value is 
returned from a PGSQL function:


==
DOES NOT WORK
==

svcprod=# EXPLAIN SELECT
COALESCE(SUM(start_count), 0) AS start_count,
COALESCE(SUM(end_count), 0) AS end_count,
COALESCE(SUM(total_playtime), 0) AS total_playtime
  FROM audio_file_stats
  WHERE afile_id = num2id(1173, 1075, 'audio_file', 'audio_id', 
'afile_id')::bigint;


QUERY PLAN
--
 Aggregate  (cost=118677.35..118677.36 rows=1 width=19)
   -  Seq Scan on audio_file_stats  (cost=0.00..118675.33 rows=268 
width=19)
 Filter: (afile_id = num2id(1173::bigint, 1075::bigint, 
'audio_file'::character varying, 'audio_id'::character varying, 
'afile_id'::character varying))

(3 rows)

==
WORKS
==

byoaudio=# EXPLAIN SELECT
COALESCE(SUM(start_count), 0) AS start_count,
COALESCE(SUM(end_count), 0) AS end_count,
COALESCE(SUM(total_playtime), 0) AS total_playtime
  FROM audio_file_stats
  WHERE afile_id = 48702;

QUERY PLAN
--
 Aggregate  (cost=672.69..672.70 rows=1 width=19)
   -  Index Scan using audio_file_stats_idx_afile_id on 
audio_file_stats  (cost=0.00..670.73 rows=261 width=19)

 Index Cond: (afile_id = 48702)
(3 rows)

==

The function I use is defined as using returning a BIGINT which is the 
same datatype as is used by the index:


FUNCTION public.num2id (in_acct_id bigint, in_value bigint,
  in_table_name varchar, in_input_column varchar,
  in_output_column varchar) RETURNS bigint

Can someone help explain what is being done wrong here?  I'm using 8.2.4.

-- Dante


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Need LIMIT and ORDER BY for UPDATE

2007-12-13 Thread D. Dante Lorenso

John D. Burger wrote:

D. Dante Lorenso wrote:

I'd really like to have ORDER BY and LIMIT for UPDATE and DELETE 
commands.  Is this possible?


  UPDATE invoice i
  SET reserve_ts = NOW() + '1 hour'::timestamp
  FROM account a
  WHERE a.acct_id = i.acct_id
  AND i.reserve_ts  NOW()
  AND a.status = 'A'
  AND i.is_paid IS FALSE
  ORDER BY i.create_ts ASC
  LIMIT 1
  RETURNING invoice_id;

This query would find JUST ONE invoice record which is not paid and 
reserve the right to operate on the row using the 'reserve_ts' column 
for all active accounts.  The one row would be the oldest invoice 
matching the criteria.  Only that one row would be updated and the 
invoice_id of the updated row (if any) would be returned.


Can something like what I want be added in a future version?  Ideas or 
alternatives?  I don't see how I can rewrite this query as a single 
statement any other way and get the same expectations.


Doesn't this do it, assuming invoice_id is unique?

  UPDATE invoice
  SET reserve_ts = NOW() + '1 hour'::timestamp
  where invoice_id =
(select invoice_id from invoice i,
account a
WHERE a.acct_id = i.acct_id
AND i.reserve_ts  NOW()
AND a.status = 'A'
AND i.is_paid IS FALSE
ORDER BY i.create_ts ASC
LIMIT 1)
  RETURNING invoice_id;


Doesn't this create race condition in the query where multiple processes 
might find the same invoice_id while executing the inner select.  The 
update would then update the same record more than once during the 
update step and 2 processes might get the same invoice_id returned.  In 
otherwords, moving the select criteria into a sub-query breaks the 
atomic nature of the update.  Right?


I have been trying to doing something like this, though:

  UPDATE invoice
  SET reserve_ts = NOW() + '1 hour'::timestamp
  WHERE reserve_ts  NOW()
  AND invoice_id = (
SELECT invoice_id
FROM invoice i, account a
WHERE a.acct_id = i.acct_id
AND i.reserve_ts  NOW()
AND a.status = 'A'
AND i.is_paid IS FALSE
ORDER BY i.create_ts ASC
LIMIT 1
  )
  RETURNING invoice_id;

By checking the reserve_ts inside the SELECT and again inside the UPDATE 
this should catch the race condition and only allow one process to 
perform the update on a given match.  If the other process has updated 
the reserve_ts already, the reserve_ts would not pass the second check. 
 However, the new side-effect is that one process would receive a NULL 
return result when the race condition occurs rather than just picking up 
the next queue invoice_id.


Unless I can get what I really want, this will have to do, I suppose.

-- Dante

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] For the SQL gurus out there

2007-12-13 Thread D. Dante Lorenso

Uwe C. Schroeder wrote:

Ok, something I've been toying around with for a while.
Here's the scenario:
Imagine a blogging app.
I have a table for blogs with a blog_id (primary key)
and a table blog_comments also with a comment_id as primary key and a foreign 
key holding the blog_id the post belongs to.
The comments table also has a field that holds a self-reference to comment id 
for commments on comments (on comments) of a blog.


What I would like to do is to create a view that sucks the comments for a 
given blog_id in the order they should be displayed (very commonly seen in 
pretty much all blogging apps), i.e.


Blog
comment 1
  comment on comment 1
comment on comment on comment 1
comment 2
etc.


Question is, is there a smart way I'm not able to figure out to create a 
single query on the blog comment table that will return the comments in the 
right order? Sure I could write a recursive method that assembles the data in 
correct order, but I'd prefer to leave that to the database to handle in a 
view.


What you are looking for is the 'connectby' function found in contrib as 
part of the tablefunc package.  On my CentOS 5.1 box, it's part of this RPM:



 rpm -qi postgresql-contrib
Name: postgresql-contrib   Relocations: (not relocatable)
Version : 8.2.4 Vendor: (none)
Release : 1PGDG Build Date: Fri 20 Apr 2007 
01:58:54 PM CDT
Install Date: Sun 16 Sep 2007 12:27:55 AM CDT  Build Host: 
rhel5x8664.gunduz.org
Group   : Applications/DatabasesSource RPM: 
postgresql-8.2.4-1PGDG.src.rpm

Size: 1724563  License: BSD
Signature   : DSA/SHA1, Fri 20 Apr 2007 02:14:40 PM CDT, Key ID 
a667b5d820579f11

URL : http://www.postgresql.org/
Summary : Contributed source and binaries distributed with PostgreSQL
Description :
The postgresql-contrib package contains contributed packages that are
included in the PostgreSQL distribution.


And the files you want to look at is here:

  /usr/share/doc/postgresql-contrib-8.2.4/README.tablefunc

It talks about connectby here:


...
Name

connectby(text, text, text[, text], text, text, int[, text]) - returns a 
set representing a hierarchy (tree structure)


Synopsis

connectby(text relname, text keyid_fld, text parent_keyid_fld
[, text orderby_fld], text start_with, int max_depth
[, text branch_delim])
...


Do some searching for 'connectby' and PostgreSQL in google.

-- Dante

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Need LIMIT and ORDER BY for UPDATE

2007-12-13 Thread D. Dante Lorenso

Bill Moran wrote:

D. Dante Lorenso [EMAIL PROTECTED] wrote:

All,

I'd really like to have ORDER BY and LIMIT for UPDATE and DELETE 
commands.  Is this possible?


   UPDATE invoice i
   SET reserve_ts = NOW() + '1 hour'::timestamp
   FROM account a
   WHERE a.acct_id = i.acct_id
   AND i.reserve_ts  NOW()
   AND a.status = 'A'
   AND i.is_paid IS FALSE
   ORDER BY i.create_ts ASC
   LIMIT 1
   RETURNING invoice_id;

This query would find JUST ONE invoice record which is not paid and 
reserve the right to operate on the row using the 'reserve_ts' column 
for all active accounts.  The one row would be the oldest invoice 
matching the criteria.  Only that one row would be updated and the 
invoice_id of the updated row (if any) would be returned.


Running a query like this over and over would pop just one record off 
the queue and would guarantee an atomic reservation.


While I'm not going to argue as to whether your suggestion would be
a good idea or not, I will suggest you look at SELECT FOR UPDATE, which
will allow you to do what you desire.


  UPDATE invoice
  SET reserve_ts = NOW() + '1 hour'::interval
  WHERE invoice_id = (
SELECT invoice_id
FROM invoice i, account a
WHERE a.acct_id = i.acct_id
AND i.reserve_ts  NOW()
AND a.status = 'A'
AND i.is_paid IS FALSE
ORDER BY i.create_ts ASC
LIMIT 1
FOR UPDATE
  )
  RETURNING invoice_id;

Does this do the same thing while still remaining a single atomic query 
that will guarantee no race conditions during the inner select/update?


  ERROR:  SELECT FOR UPDATE/SHARE is not allowed in subqueries

Guess not.

-- Dante


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Function to convert from TEXT to BYTEA?

2007-12-12 Thread D. Dante Lorenso

Richard Huxton wrote:

D. Dante Lorenso wrote:
I want to use the ENCRYPT and DECRYPT functions from contrib, but they 
require inputs of BYTEA.


My data is in VARCHAR and TEXT fields and when I try to use the 
contrib functions, they complain about wrong datatypes.  Is there a 
string function or something that will take a VARCHAR or TEXT input 
and output a BYTEA so that I can use that as input for the 
ENCRYPT/DECRYPT functions?


I know about creating a CAST from VARCHAR to BYTEA, but the problem 
with a CAST is that it doesn't port to other database servers when I 
do a dump and restore. 


Doesn't it?
Hmm... seems to dump for me in 8.2


My CAST was defined as follows:

CREATE CAST (VARCHAR AS BYTEA) WITHOUT FUNCTION;

Tom explains why that does NOT dump and restore with my database here:

http://archives.postgresql.org/pgsql-general/2007-11/msg00922.php
http://archives.postgresql.org/pgsql-general/2007-11/msg00931.php

Likely my problem is that I don't use a function to do the cast.


  That forces me to manually have to recreate the cast
each time a new database is set up and usually that's the step that 
gets forgotten.


Surely you have a script that creates your databases for you?


Is this enough script?:

DUMP:
/usr/bin/pg_dump -U [user] -Ft [dbname]  [tar_file]

RESTORE:
/usr/bin/pg_restore -c -Ft [tar_file] | /usr/bin/psql -U [user] [dbname]

Is there a function that will do what I want to convert the datatype 
without having to create a CAST that PostgreSQL doesn't have natively? 
How else are you supposed to use the ENCRYPT and DECRYPT functions?

With actual bytea types?


Sure, bytea works, but I want this to work:

  SELECT DECRYPT(ENCRYPT('cheese', 'secret', 'bf'), 'secret', 'bf');

I don't see any BYTEA in there ...

Anyway this will convert for you - PG can get from an unknown quoted 
literal to bytea just fine.

CREATE FUNCTION text2bytea(text) RETURNS bytea AS $_$
DECLARE
  b bytea;
BEGIN
  EXECUTE 'SELECT  ' || quote_literal($1) || '::bytea' INTO b;
  RETURN b;
END
$_$
LANGUAGE plpgsql;


Awesome!  That's just what I was looking for!


And here's the cast definition that goes with it
CREATE CAST (text AS bytea) WITH FUNCTION public.text2bytea(text);


Perfect.  And now that this CAST depends on a function which is in my 
database, it should dump and restore without a problem.


Let me go test all this ... YEP THAT WORKS!

Thanks again!

-- Dante

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] Extract last 4 characters from string?

2007-12-12 Thread D. Dante Lorenso

All,

Getting the first 4 characters from the begining of a string is easy enough:

  SELECT SUBSTR('ABCDEFGHIJKLMNOP', 1, 4);

Returns 'ABCD'.  But getting the last 4 characters appears to be a 
little more work and is ugly:


  SELECT SUBSTR('ABCDEFGHIJKLMNOP', LENGTH('ABCDEFGHIJKLMNOP') - 3, 4);

Returns 'MNOP'.  I hate having to provide my input string more than once 
like this.  So ... uglier:


  REGEXP_REPLACE('ABCDEFGHIJKLMNOP', '^.*()$', '\\1');

Returns 'MNOP'.  Many languages have a version of substr that takes 
negative arguments to begin offset from the end of the string like this:


  SELECT SUBSTR('ABCDEFGHIJKLMNOP', -4);

That doesn't seem to work in PostgreSQL.  In fact, it doesn't even error 
out ... it just returns the whole string.  Is there an easy (preferred) 
 method that I'm missing?


-- Dante

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Extract last 4 characters from string?

2007-12-12 Thread D. Dante Lorenso

Rodrigo De León wrote:

On Dec 12, 2007 4:11 PM, D. Dante Lorenso [EMAIL PROTECTED] wrote:

Is there an easy (preferred) method that I'm missing?

select substring('ABCDEFGHIJKLMNOP' from '$');


Thanks!  Now, since I'd like to see a number in there, I'll do this:

  SELECT SUBSTRING('ABCDEFGHIJKLMNOP' FROM '.{4}$');

That does look a lot better than what I was cooking up ... appreciate 
the help!


-- Dante

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] Need LIMIT and ORDER BY for UPDATE

2007-12-12 Thread D. Dante Lorenso

All,

I'd really like to have ORDER BY and LIMIT for UPDATE and DELETE 
commands.  Is this possible?


  UPDATE invoice i
  SET reserve_ts = NOW() + '1 hour'::timestamp
  FROM account a
  WHERE a.acct_id = i.acct_id
  AND i.reserve_ts  NOW()
  AND a.status = 'A'
  AND i.is_paid IS FALSE
  ORDER BY i.create_ts ASC
  LIMIT 1
  RETURNING invoice_id;

This query would find JUST ONE invoice record which is not paid and 
reserve the right to operate on the row using the 'reserve_ts' column 
for all active accounts.  The one row would be the oldest invoice 
matching the criteria.  Only that one row would be updated and the 
invoice_id of the updated row (if any) would be returned.


Running a query like this over and over would pop just one record off 
the queue and would guarantee an atomic reservation.


Similar syntax would be very useful for DELETE operations.  The idea is 
that doing an UPDATE with RETURNING allows a get/set operation in a 
single query.  Without the LIMIT and ORDER BY, I'm forced to reserve all 
rows at once which my application doesn't want to handle like that.


Can something like what I want be added in a future version?  Ideas or 
alternatives?  I don't see how I can rewrite this query as a single 
statement any other way and get the same expectations.


-- Dante

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


[GENERAL] Function to convert from TEXT to BYTEA?

2007-12-11 Thread D. Dante Lorenso

All,

I want to use the ENCRYPT and DECRYPT functions from contrib, but they 
require inputs of BYTEA.


My data is in VARCHAR and TEXT fields and when I try to use the contrib 
functions, they complain about wrong datatypes.  Is there a string 
function or something that will take a VARCHAR or TEXT input and output 
a BYTEA so that I can use that as input for the ENCRYPT/DECRYPT functions?


I know about creating a CAST from VARCHAR to BYTEA, but the problem with 
a CAST is that it doesn't port to other database servers when I do a 
dump and restore.  That forces me to manually have to recreate the cast 
each time a new database is set up and usually that's the step that gets 
forgotten.


Is there a function that will do what I want to convert the datatype 
without having to create a CAST that PostgreSQL doesn't have natively? 
How else are you supposed to use the ENCRYPT and DECRYPT functions?


-- Dante

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] pg_dump not including custom CAST?

2007-11-16 Thread D. Dante Lorenso

All,

I did this in my database:

  CREATE CAST (VARCHAR AS BYTEA) WITHOUT FUNCTION;

But when I use pg_dump to dump the database and use pg_restore to bring 
it back on a freshly created database, the CAST is the only part of the 
restore which is missing.


I'm using PostgreSQL 8.2.4 for both the dump and restore database.  Why 
doesn't the CAST dump and restore?


-- Dante

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Selecting tree data

2007-10-25 Thread D. Dante Lorenso

Pat Maddox wrote:

I'd like to store some tree data in my database.  I want to be able to
sort the data but maintain a tree structure
Is it possible to pull all the data like that with one query?  How do
I need to structure the table, and what query do I have to run in
order to make it happen?


You need to look at the connectby function which is part of contrib.

-- Dante

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] Problem with BYTEA, CAST, and pg_dump

2007-10-22 Thread D. Dante Lorenso

All,

I'm trying to use the crypt and decrypt functions from contrib and have 
installed them into my database.  The definition for crypt seems to 
require that I use BYTEA datatype to input the data I need to encrypt. 
All of my data is either TEXT or VARCHAR, though and not BYTEA.


I was trying to run a query like this from a trigger:

SELECT encrypt(NEW.card_number::bytea, 'password', 'bf')
INTO NEW.card_number_enc;

Problem is that a cast from VARCHAR to BYTEA didn't exist.  I saw an 
email thread that suggested I create the cast myself so I did:


CREATE CAST (VARCHAR TO BYTEA) WITHOUT FUNCTION;

And that worked.  Now I can run the encrypt function above without a 
problem.


However, when I use pg_dump and restore the dumped database, the CREAT 
CAST is lost.  Is this a bug with pg_dump or should I be doing something 
different?


-- Dante



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] Using RETURNING with INTO inside pgsql

2007-09-20 Thread D. Dante Lorenso
Not really a question here, I just wanted to share my joy with the 
group.  I'm loving the new RETURNING clause in PostgreSQL.  This is 
really cool stuff ...


 8 
CREATE OR REPLACE FUNCTION public.testfunc () RETURNS bigint AS
$body$
DECLARE
my_var BIGINT;
BEGIN
INSERT INTO tryit (col1, col2)
VALUES ('a', 'b')
RETURNING prikey
INTO my_var;

RETURN my_var;
END;
$body$
LANGUAGE 'plpgsql' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER;
 8 

I never do anything this trivial inside a stored proc, but the point is 
not what I'm doing but rather how it's done.  That RETURNING clause 
really helps reduce the amount of SQL I have to write.  PostgreSQL 8.2.4 
rocks!


-- Dante

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] how to know the current size of a database

2007-09-19 Thread D. Dante Lorenso

[EMAIL PROTECTED] wrote:

I want to know about the size of my database. For example, I want to know
how many Mb of data for current myDatabase database in a postgres server.


SELECT pg_size_pretty(pg_database_size(current_database()));

-- Dante

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Event-driven programming?

2007-09-12 Thread D. Dante Lorenso

Pavel Stehule wrote:

2007/9/12, Jay Dickon Glanville [EMAIL PROTECTED]:

- I write a function (it doesn't matter what language it's in:
PL/pgSQL, PL/Java, etc)
- I register that function as a post-commit callback function
- when a client commits a transaction, the function gets called, and
the database passes the function some general information as to the
content of the transaction

Note how similar this process is to writing triggers.  The only
problem I have with triggers is that events get generated per-table.
I'd like to get notifications based on transactions, not table
changes.

What I'd like to be able to do with this event is to notify any
applications of this change, so they can update their cached view of
the database.


Although I'm happy to use triggers as-is (not per transaction, etc) I've 
also wondered about firing events from the database.  I'm curious to 
know if anyone has attempted to write a trigger that will open a socket 
and send an event packet to an application server on the network.


I've considered using a message queue like JMS to manage events on my 
network and have PostgreSQL fire off UDP messages to a socket server 
that would insert jobs into the message queue as triggers get fired in 
the database.  Doing this would be an alternative to storing the queue 
as a database table and having to use polling to constantly check the 
database for events in the queue.


I am interested what anybody might contribute to this thread.  Let us 
know what you tried whether it worked or not, it might be useful.


-- Dante

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Is there a better way to do this?

2007-08-28 Thread D. Dante Lorenso

Wei Weng wrote:

I want to implement something like the following:
CREATE OR REPLACE FUNCTION AddDays


You don't know how many seconds are in a day, so just add the days using 
SQL.


RETURN time + (days || ' days')::INTERVAL;

You don't even need to make that a function, just do that you your SQL 
directly.


-- Dante

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Is there a better way to do this?

2007-08-28 Thread D. Dante Lorenso

Michael Glaesemann wrote:


On Aug 28, 2007, at 16:55 , D. Dante Lorenso wrote:

RETURN time + (days || ' days')::INTERVAL;
It's bad practice to concatenate like this. Use time + days * interval 
'1 day' and be done with it.


Why?  Is this functionality expected to break in the future or has 
unexpected side effects?  Is it less clear or less efficient?  Who 
declared it bad practice and where can I read that documentation?


-- Dante

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Is there a better way to do this?

2007-08-28 Thread D. Dante Lorenso

Michael Glaesemann wrote:

On Aug 28, 2007, at 17:22 , D. Dante Lorenso wrote:

Michael Glaesemann wrote:

On Aug 28, 2007, at 16:55 , D. Dante Lorenso wrote:

RETURN time + (days || ' days')::INTERVAL;
It's bad practice to concatenate like this. Use time + days * 
interval '1 day' and be done with it.


Why?  Is this functionality expected to break in the future or has 
unexpected side effects?  Is it less clear or less efficient?  Who 
declared it bad practice and where can I read that documentation?


It's generally bad practice to interpolate unnecessarily. You're right, 
in this case you're probably safe from this particular case ever 
changing. I personally find it less clear (though clarity is often in 
the eye of the beholder). time +  * interval '1 day' is to me a clearer 
expression of what you're doing: add this multiple of days to the time.


Something in my just doesn't want to trust that:

30 * interval '1 day' == interval '30 days'

Intervals are magical things unlike int and text.  Doing multiplication 
on a magical thing is scary, but waiting until the end before applying 
the magic just *feels* safer.


I do like your syntax, though.  There are less parentheses.  Maybe I can 
warm up to it ;-)


-- Dante

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Using oid as pkey

2007-08-20 Thread D. Dante Lorenso

Michael Glaesemann wrote:


On Aug 20, 2007, at 16:58 , Ed L. wrote:
You'd have  to specify your table WITH OIDS anyway as they're no longer used by 
default for table rows, so there's really nothing to be gained by using 
oids.


How exactly can you get rid of OIDs when using a language like PHP?  The 
magic of SERIAL and BIGSERIAL is that they are supposed to be like 
MySQL's AUTO INCREMENT feature and they create their own SEQUENCE for 
you automatially to handle the serialization.  Yet, I can't get the 
value of the serial column without knowing the name of the serial sequence.


Using a brain-dead sample table that looks like this:

CREATE table some_table (
col0 SERIAL,
col1 VARCHAR,
col2 VARCHAR
);

I want to do something like this:

INSERT INTO some_table (col1, col2)
VALUES ('val1', 'val2');

I want the value of col0 returned to the application and I don't want to 
know the name of the sequence involved in the SERIAL column.  I just 
want the value inserted into the column by using just it's column name.


In PHP with PDO, I've only been able to get this by first finding the 
OID value from 'lastInsertId' and then using that OID to run this select:


SELECT $column AS last_inserted_id
FROM $table
WHERE oid = ?

How else could this be done without the round-trip back the db server or 
knowing too much about the SERIAL internals that I shouldn't really need 
to know?


-- Dante




Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Using oid as pkey

2007-08-20 Thread D. Dante Lorenso

Rainer Bauer wrote:

D. Dante Lorenso wrote:


Using a brain-dead sample table that looks like this:

CREATE table some_table (
col0 SERIAL,
col1 VARCHAR,
col2 VARCHAR
);

I want to do something like this:

INSERT INTO some_table (col1, col2)
VALUES ('val1', 'val2');

I want the value of col0 returned to the application and I don't want to 
know the name of the sequence involved in the SERIAL column.  I just 
want the value inserted into the column by using just it's column name.


Using 8.2 or above:
INSERT INTO some_table (col1, col2) VALUES ('val1', 'val2') RETURNING col0;


Oh ... VERY NICE ;-)  This is even BETTER than just returning the value 
of the SERIAL column since you can return any column even if there are 
more than one SERIAL columns in a table!  No need for OID, no need for 
LASTVAL() ...


I see this from the documentation of 8.2:

-- 8  8 --
The optional RETURNING clause causes INSERT to compute and return 
value(s) based on each row actually inserted. This is primarily useful 
for obtaining values that were supplied by defaults, such as a serial 
sequence number. However, any expression using the table's columns is 
allowed. The syntax of the RETURNING list is identical to that of the 
output list of SELECT.

-- 8  8 --

Exactly what I was looking for.  Looks like I need to make moves to get 
from 8.1 onto 8.2 ;-)


Thanks, Rainer!

-- Dante

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-16 Thread D. Dante Lorenso

Douglas McNaught wrote:

D. Dante Lorenso [EMAIL PROTECTED] writes:
How quickly after you update the row status are you comitting (and
releasing locks)?


I am calling a stored proc from PHP.  Since I do not begin a 
transaction, I assume that my call is automatically committed 
immediately after invocation.


SELECT reserve_next_tcqueue(?, ?, ?) AS result


SELECT id FROM job_table WHERE status = 'New' FOR UPDATE;
UPDATE job_table SET status = 'Processing' WHERE id IN (set of IDs);
COMMIT; -- releases all locks
process each job in the list we got and update its status
This has worked very well for me.


Yes, this does work well for me also most of the time.  It is only when 
the database server begins to suffer from severe load (like 3+) that 
PostgreSQL begins to log the reserve_next_tcqueue(...) queries as taking 
a long time to complete.  Here are some examples:


...

Aug 13 16:00:42 shed03 postgres[20264]: [5-2]  reserve_next_tcqueue($1, 
$2, $3) AS tcq_id]
Aug 13 16:00:53 shed03 postgres[17338]: [5-1] 17338 dbxxx 10.10.20.163 
LOG:  duration: 3159.208 ms  statement: EXECUTE unnamed  [PREPARE:  SELECT


Aug 13 16:00:54 shed03 postgres[20447]: [5-2]  reserve_next_tcqueue($1, 
$2, $3) AS tcq_id]
Aug 13 16:00:54 shed03 postgres[20470]: [5-1] 20470 dbxxx 10.10.20.51 
LOG:  duration: 4162.031 ms  statement: EXECUTE unnamed  [PREPARE:  SELECT


Aug 13 16:00:54 shed03 postgres[20470]: [5-2]  reserve_next_tcqueue($1, 
$2, $3) AS tcq_id]
Aug 13 16:00:59 shed03 postgres[20530]: [5-1] 20530 dbxxx 10.10.20.51 
LOG:  duration: 3672.077 ms  statement: EXECUTE unnamed  [PREPARE:  SELECT


...

-- Dante

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-16 Thread D. Dante Lorenso

Tom Lane wrote:

D. Dante Lorenso [EMAIL PROTECTED] writes:

Douglas McNaught wrote:

How quickly after you update the row status are you comitting (and
releasing locks)?


I am calling a stored proc from PHP.  Since I do not begin a 
transaction, I assume that my call is automatically committed 
immediately after invocation.


Have you actually verified that, or are you just assuming it?


Just assuming.  I'm not really sure HOW to verify it, though.


I believe that PHP has some sort of autocommit on/off option,
which might possibly be doing things behind your back.


I am using PHP / PDO and all my statements are prepared.  Sometimes I 
will begin a transaction using PDO and do either commit or rollback, but 
I don't explicitly use transactions if I don't intend to exec more than 
one statement.  Here is what PHP says about auto-commit in PDO:


 8  8  8 
http://www.php.net/manual/en/ref.pdo.php
Unfortunately, not every database supports transactions, so PDO needs to 
run in what is known as auto-commit mode when you first open the 
connection. Auto-commit mode means that every query that you run has its 
own implicit transaction, if the database supports it, or no transaction 
if the database doesn't support transactions.


If you need a transaction, you must use the PDO-beginTransaction() 
method to initiate one. If the underlying driver does not support 
transactions, a PDOException will be thrown (regardless of your error 
handling settings: this is always a serious error condition). Once you 
are in a transaction, you may use PDO-commit() or PDO-rollBack() to 
finish it, depending on the success of the code you run during the 
transaction.

 8  8  8 

So, I feel safe enough with my assumption.  I'm not entirely sure about 
the stored procedure, though.  I've recently rewritten the procedure as 
separate queries, but don't know if that will help until I hit a high 
peak load again.


-- Dante


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-15 Thread D. Dante Lorenso

All,

I have a stored procedure that I use to manage a queue.  I want to pop 
an item off the queue to ensure that only one server is processing the 
queue item, so inside PGSQL, use SELECT ... FOR UPDATE to lock the row. 
 Here's how I pop the queue item:


--
CREATE OR REPLACE FUNCTION public.reserve_next_tcqueue (in_hostname 
varchar, in_status char, in_new_status char) RETURNS bigint AS

$body$
DECLARE
my_reserved_id BIGINT;
BEGIN
 /* find and lock a row with the indicated status */
 SELECT tcq_id
 INTO my_reserved_id
 FROM queue q
 WHERE q.status = in_status
 ORDER BY tcq_id ASC
 LIMIT 1
 FOR UPDATE;

/* we didn't find anything matching */
IF NOT FOUND THEN
RETURN 0;
END IF;

/* change the status to the new status */
UPDATE queue SET
  status = in_new_status,
  ts_start = NOW(),
  ts_end = NULL,
  hostname = COALESCE(in_hostname, hostname)
WHERE tcq_id = my_reserved_id;

/* send back our reserved ID */
RETURN my_reserved_id;
END;
$body$
LANGUAGE 'plpgsql' VOLATILE RETURNS NULL ON NULL INPUT SECURITY INVOKER;
--

When my server is under severe load, however, this function begins to 
take a long time to execute and I begin to suspect that the FOR UPDATE 
lock might be locking the whole table and not just the row.


How do I know if this is the case, how can I optimize this procedure, 
and how should I be doing this differently?  I'm guessing stacks and 
queues would be common patterns handled in the PostgreSQL community.


Thoughts?

-- Dante

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-15 Thread D. Dante Lorenso

Erik Jones wrote:


On Aug 15, 2007, at 2:39 PM, [EMAIL PROTECTED] wrote:


Erik Jones wrote:

On Aug 15, 2007, at 1:09 PM, D. Dante Lorenso wrote:

...to ensure that only one server is processing the queue item, so 
inside PGSQL, use SELECT ... FOR UPDATE to lock the row...
When my server is under severe load, however, this function begins 
to take a long time to execute and I begin to suspect that the FOR 
UPDATE lock might be locking the whole table and not just the row.  
How do I know if this is the case, how can I optimize this 
procedure, and how should I be doing this differently?  ...


Thoughts?


SELECT ... FOR UPDATE should only be locking the rows returned by 
your the select statement, in this case the one row.  You can check 
what locks exist on a table (and their type) with the pg_locks system 
view.




Is that correct? Documentation section 12.3.1. Table-Level Locks 
states 'The list below shows the available lock modes ...Remember that 
all of these lock modes are table-level locks, even if the name 
contains the word row'.


You will notice that SELECT ... FOR UPDATE is not in that list.  It's 
covered in the next section on row level locks.


I wonder why bother with the SELECT statement at all. Why not just go 
straight to the UPDATE statement with something like


   UPDATE queue SET
 status = in_new_status,
 ts_start = NOW(),
 ts_end = NULL,
 hostname = COALESCE(in_hostname, hostname)
   WHERE tcq_id = (SELECT tcq_id  FROM queue q WHERE q.status = 
in_status ORDER BY tcq_id ASC LIMIT 1);


He may need to trap an exception for the not found case, but what's 
the big deal with that?


UPDATE statements acquire a ROW EXCLUSIVE on the table, which 
conflicts, among other things, with ROW EXCLUSIVE, so it will block 
other UPDATE statements initiated by other transactions.


That won't work because the update won't lock the row until the select 
returns.  So, if two process execute that at the same time they will 
both execute the subquery and return the same result, the first will 
update it and the second will then (redundantly) update it.


It also won't work because I need to change AND read the row.  If I only 
do the update, I don't know what was updated.  I still need to return 
the tcq_id to my application.


Maybe the update could look like this:

UPDATE queue SET
  status = in_new_status,
  ts_start = NOW(),
  ts_end = NULL,
  hostname = COALESCE(in_hostname, hostname),
WHERE status = in_status;

But there I don't have the LIMIT 1, and I also don't know which rows got 
updated.  I supposed there might be some magic to find the OID of the 
affected rows, but I don't know how what would be done.


I need logic like atomic test and set or pop 1 item off the queue 
atomically and tell me what that item was.


In my situation, there are a dozen or so machines polling this queue 
periodically looking for work to do.  As more polling is occurring, the 
locks seem to be taking longer so I was worried table-level locks might 
be occurring.


-- Dante


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] How do I aggregate data from multiple rows into a delimited list?

2007-07-04 Thread D. Dante Lorenso

Berend Tober wrote:

D. Dante Lorenso wrote:
I want to select several rows of data and have them returned in a 
single record with the rows joined by a delimiter. 


Review the User Comments at
http://www.postgresql.org/docs/8.0/interactive/sql-createaggregate.html;
for some ideas.


I found a better solution using ARRAY_TO_STRING and ARRAY.  My question 
email was originally blocked because I sent it to the list from the 
wrong email address.  Unfortunately it was later unfrozen and sent on to 
the list (sorry) because I did more searching and had solved the problem 
on my own:


http://archives.postgresql.org/pgsql-general/2007-07/msg00075.php

Thanks for all your help, though!

-- Dante


---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] How do I aggregate data from multiple rows into a delimited list?

2007-07-03 Thread D. Dante Lorenso
I want to select several rows of data and have them returned in a single 
record with the rows joined by a delimiter.  It would be great if this 
could be done in a generic way possibly using the GROUP BY like this:


   SELECT a.id, a.name, STR_SUM(b.name, ',') AS b_names
   FROM a, b
   WHERE a.id = b.id
   GROUP BY a.id, a.name;

Sample data would look like this:

[table a]
 id | name
+--
1   | one
2   | two
3   | three
4   | four

[table b]
 id | name
+--
1   | pizza
1   | hot dog
2   | gorilla
2   | monkey
3   | apple
4   | cheese
4   | milk
4   | eggs

And the result would look like this:

 id | name  | b_names
+---+-
1   | one   | pizza,hot dog
2   | two   | gorilla,monkey
3   | three | apple
4   | four  | cheese,milk,eggs  

The STR_SUM function above would be some aggregate that just joins 
records together using concatenation.  If the function can't be used as 
an aggregate, I suppose I could just use a sub-select:


   SELECT a.id, a.name, (
 SELECT STR_SUM(b.name, ',')
 FROM b
 WHERE b.id = a.id) AS b_names
   FROM a;

Does my made-up function STR_SUM exist in PostgreSQL already?  Has 
anyone written one they could share?  I'm fairly capable with PL/PGSQL 
and could write a function to loop through records and concate onto a 
string, but before I brute-force this one, I was hoping to find 
something more elegant preferred by the community.


Thanks,

-- Dante



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


[GENERAL] How-To: Aggregate data from multiple rows into a delimited list.

2007-07-02 Thread D. Dante Lorenso
This is not a question, but a solution.  I just wanted to share this 
with others on the list in case it saves you a few hours of searching...


I wanted to select several rows of data and have them returned in a 
single record with the rows joined by a delimiter.  Turns out this is 
very easy to do in PostgreSQL:


  SELECT a.id, a.name,
ARRAY_TO_STRING(ARRAY(
  SELECT b.name
  FROM b
  WHERE b.id = a.id
  ORDER BY b.name ASC
), ',') AS b_names
  FROM a
  ORDER BY a.id ASC;

Sample data would look like this:

[table a]
 id | name
+--
1   | one
2   | two
3   | three
4   | four

[table b]
 id | name
+--
1   | pizza
1   | hot dog
2   | gorilla
2   | monkey
3   | apple
4   | cheese
4   | milk
4   | eggs

And the result would look like this:

 id | name  | b_names
+---+-
1   | one   | pizza,hot dog
2   | two   | gorilla,monkey
3   | three | apple
4   | four  | cheese,milk,eggs

This is an easy way to return attributes of a record from another table 
without having to issue multiple queries or deal with multiple result 
records.


Enjoy!

-- Dante


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] Most efficient report of number of records in all tables?

2007-02-26 Thread D. Dante Lorenso

All,

I can find the names of all tables in the database with this query:

   SELECT table_name
   FROM information_schema.tables
   WHERE table_type = 'BASE TABLE'
   AND table_schema NOT IN ('pg_catalog', 'information_schema')
   ORDER BY table_name ASC;


Then, in code, I can loop through all the table names and run the 
following query:


   SELECT COUNT(*) AS result
   FROM $table;


But, this can be slow when I have a large number of tables of some 
tables have several million rows.


Is there a faster way to get this data using table statistics or 
something like that?  Perhaps something in a single query?


-- Dante




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Most efficient report of number of records in all tables?

2007-02-26 Thread D. Dante Lorenso

Dann Corbit wrote:

If you only need a cardinality estimate, then pg_class.reltuples may be
of help (it will be accurate to when the last vacuum was performed).
  


Last vacuum ... how does that work with autovacuum?


If you need exact counts then there are a couple of problems:
1.  An MVCC database cannot store an exact count, because it can differ
by user.  Hence, to collect the exact number, a table scan is necessary.
  
A table scan ... ouch?  I just assumed that COUNT(*) FROM table_name 
would be a fast query internally.  I see what you mean about MVCC, though.



2.  The number can be invalid immediately after the query and might be
different for different users anyway.
  
The numbers don't really need to be 100% accurate (it's just a ballpark 
stat).



What are you doing with those numbers?
  


It's just an administrative report showing patterns of growth in our 
database storage.  We are trying to keep statistics for users and our 
stats tables are generating about 50,000 records daily.  We only know 
this is true because we have this reports which shows table record 
counts daily.


-- Dante

  

-Original Message-
From: [EMAIL PROTECTED] [mailto:pgsql-general-
[EMAIL PROTECTED] On Behalf Of D. Dante Lorenso
Sent: Monday, February 26, 2007 2:20 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Most efficient report of number of records in all
tables?

All,

I can find the names of all tables in the database with this query:

SELECT table_name
FROM information_schema.tables
WHERE table_type = 'BASE TABLE'
AND table_schema NOT IN ('pg_catalog', 'information_schema')
ORDER BY table_name ASC;


Then, in code, I can loop through all the table names and run the
following query:

SELECT COUNT(*) AS result
FROM $table;


But, this can be slow when I have a large number of tables of some
tables have several million rows.

Is there a faster way to get this data using table statistics or
something like that?  Perhaps something in a single query?

-- Dante




---(end of


broadcast)---
  

TIP 6: explain analyze is your friend



  




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Drawbacks of using BYTEA for PK?

2004-01-12 Thread D. Dante Lorenso

Tom Lane wrote:

Adding an MD5 hash contributes *absolutely zero*, except waste of space,
to any attempt to make a GUID.  The hash will add no uniqueness that was
not there before.

The cool thing about a 'GUID' (or in my example a hashed sequence number 
[sure
toss in some entropy if you want it]) is that if you happen to reference 
that
value as a primary key on a table, the URL that passes the argument can not
be guessed at easily.  For example using a sequence:

   http://domain.com/application/load_record.html?customer_id=12345

Then, users of the web will assume that you have at most 12345 
customers.  And
they can try to look up information on other customers by doing:

   http://domain.com/application/load_record.html?customer_id=12346
   http://domain.com/application/load_record.html?customer_id=12344
...basically walking the sequence.  Sure, you will protect against this with
access rights, BUT...seeing the sequence is a risk and not something you 
want
to happen.  NOW, if you use a GUID:

   
http://domain.com/application/load_record.html?customer_id=f46d6296-5362-2526-42e3-1b8ce9d1

Right, so now try to guess the next value in this sequence.  It's a little
more protective and obfuscated (an advantage in using GUIDs).
Dante

 





---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[GENERAL] Dump/Restore ordering problem?

2004-01-12 Thread D. Dante Lorenso
First I created a function that selected the next available pin
code from a table of pre-defined pin codes:
   CREATE FUNCTION public.get_next_pin_code () RETURNS varchar AS'
   DECLARE
   my_pin_code VARCHAR;
   BEGIN
   ...
   /* this is the pincode we just fetched */
   RETURN (my_pin_code);
   END;
   'LANGUAGE 'plpgsql';
Then I created a table that used that function to set a default value:

   CREATE TABLE public.account (
   acct_id BIGSERIAL,
   ...,
  acct_pin_code VARCHAR(16) NOT NULL DEFAULT get_next_pin_code()
   ) WITH OIDS;
But, now when I pg_dump and pg_restore this database to another server,
there seems to be a problem with the ordering of the dump in that the
account table is not recreated because the function get_next_pin_code()
is not yet defined.  It seems like the function is not being created
until AFTER the table is created and this causes an ordering problem.
To dump and restore I've been doing this:

   pg_dump -U dbuser -Ft dbname | pg_restore -c -Ft | psql -U dbuser -h 
db.otherdbhost.com dbname

I've been able to work around this by creating a TRIGGER that sets the
default value instead of defining it in the table definition, but that
just seems like a hack.  Is there something I need to do to make the
dependency ordering work smarter during a dump/restore?  Or is this the
right way to do it?
Dante



 

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [GENERAL] Drawbacks of using BYTEA for PK?

2004-01-12 Thread D. Dante Lorenso

they can try to look up information on other customers by doing:

   http://domain.com/application/load_record.html?customer_id=12346
   http://domain.com/application/load_record.html?customer_id=12344
...basically walking the sequence.  Sure, you will protect against this
   

to happen.  NOW, if you use a GUID:
   

Security != obscurity.

While using GUIDs may make it harder to get hacked, it in no way actually 
increases security.  Real security comes from secure code, period.
 

Well, uh, you're both wrong.
On the one hand if your GUIDs are just an MD5 of a sequence then they're
just as guessable as the sequence.
   

Its not a question of right or wrong.  Its the method.  One thing I see here is
a failing to use several security methods at different layerswhy are you not using HTTPS (i.e. authentication)?
What about using a crytographic cookies to identify your session and link that
to you userid (after authorization)?  
 

Ok, my point is not one of security as much as the obscurity.  I have the
security aspect already covered whereby I only select the customer 
record from
the database where the logged in account has access to the record.  So, if
you are not the admin or the actual customer, the select will return a code
indicating that you do not have permission to view the given record.

Maybe a better example of my problem is with records throughout the system
like invoices, customer data, etc...  If any of these items use a sequence
and that sequence is global to the table in the database and the number is
exposed externally, then it is possible to infer the success of the company
underneath, is it not?
For instance, if I generate sequential numbers for invoice ids and the 
customer
sees #123 as an invoice number one month and sees #128 the next month, 
it might
imply that there are only 4 customers getting invoiced each month.

Another example ... let's say customers can create 'Widgets' in their 
account.
There might be a page that lists all their 'widgets'.  If you click on the
widget, you can edit it.  A link to do this might look as follows:

   http://.../account/widget_list.html
   http://.../account/widget_edit.html?widget_id=12345
Well, if the widget_id is a sequence (global to the widget table), then 
by creating
one widget, customer would get widget id (WIDG_1) and another widget 
(WIDG_2),
the customer could see that the widget_id increased by only an amount of
  
   N = WIDG_2 - WIDG_1

and would therefore provide the assumption that the number of customers 
creating
widgets in total does not exceed N.  I don't see this as much of a 
problem about
'security' in the respect of who can access the data as much as who can make
conclusions about the company beind the data. 

See what I mean?  What do you propose as the best solution for this?  
Not expose
the sequences to the user and use user-enumerated ids?  Then a trigger 
on the
table would assign ids like:

   SELECT (MAX(widget_id)+1) INTO NEW.widget_id
   WHERE cust_id = NEW.cust_id;
But I think after several hundred customer records, this trigger would start
getting slow.  I don't know really, any ideas?
Dante















---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[GENERAL] Bug with rename bigserial column

2004-01-10 Thread D. Dante Lorenso
I just ran into a dump/restore problem with a bigserial column
on a renamed table.
BIGSERIAL columns when created will automagically create
the sequence also.  The sequence name is derived from
the table name and column name. 

I had a table named 'audio_usage' and defined a column like this:

   ausage_id BIGSERIAL

who's default value was

   NEXTVAL('public.audio_usage_ausage_id_seq')

I then renamed my table to 'audio_file_usage'.  But the column
in the table remained unchanged. (that ok?)
Well, then I did a pg_dump and a pg_restore to migrate my dev
environment to qa.  Well, sure enough, the dump was fine,
but when the restore happened, the newly renamed table combined
with the same bigserial column automagically created it's
sequence, but the sequence created is different from the
default NEXTVAL which still refers to the original table_column_seq
name.
I've corrected the problem manually, but it does seem like a bug
somewhere.
Dante



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] Using indices with long unique IDs.

2004-01-09 Thread D. Dante Lorenso
Sergey Olefir wrote:

So the logical choice would be int8, right? Unfortunately quite wrong.
Statement of the form: SELECT * FROM table WHERE id=1
will never use index for id (assumming id is int8) since '1' is of type
int4. This is confirmed both by documentation and SQL EXPLAIN (after set
enable_seqscan TO 'off').
 

I'm using BIGSERIAL as the primary key for all my tables.  Please tell
me that what is described above will not be true for me as well!
When I say:
   SELECT x, y, z
   FROM mytable
   WHERE pk_mybigint = 1;
That had better be using an index, or in a few months, OMG!  Let me check:

   leads= EXPLAIN ANALYSE SELECT * FROM leads WHERE lead_id = 555300;
   QUERY 
PLAN
   
---
Seq Scan on leads  (cost=0.00..334.66 rows=1 width=263) (actual 
time=21.35..21.46 rows=1 loops=1)
  Filter: (lead_id = 555300)
Total runtime: 21.53 msec
   (3 rows)
  
   leads= EXPLAIN ANALYSE SELECT * FROM leads WHERE lead_id = 
555300::bigint;
  QUERY 
PLAN  
   

Index Scan using pk_leads on leads  (cost=0.00..5.36 rows=1 
width=263) (actual time=0.18..0.18 rows=1 loops=1)
  Index Cond: (lead_id = 555300::bigint)
Total runtime: 0.24 msec
   (3 rows)

Well, that just plain sucks.  That means I've gotta go back and
add casts to all my queries?
Tell me it isn't so!

Dante





---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [GENERAL] website doc search is extremely SLOW

2003-12-30 Thread D. Dante Lorenso
Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW.  Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.
   

What is the full URL for the page you are looking at?  Just the 'search
link' at the top of the page?
 

Perhaps this should be worked on?
   

Looking into it right now ...
 

   http://www.postgresql.org/*click Docs on top of page*
   http://www.postgresql.org/docs/   * click PostgreSQL static 
documentation *

   Search this document set: [ SECURITY INVOKER ] Search!

   
http://www.postgresql.org/search.cgi?ul=http://www.postgresql.org/docs/7.4/static/q=SECURITY+INVOKER

I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result.  I click the Search! button again
to refresh and it came back a little faster with 0 results?
  
Searched again from the top and it's a little faster now:

   * click search *
date
   Wed Dec 31 22:52:01 CST 2003
   * results come back *
date
   Wed Dec 31 22:52:27 CST 2003
Still one result:

   PostgreSQL 7.4 Documentation (SQL Key Words) 
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html 
[*0.087%*]
   http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
   Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST

However, the page that I SHOULD have found was this one:

   http://www.postgresql.org/docs/current/static/sql-createfunction.html

That page has SECURITY INVOKER in a whole section:
 
   [EXTERNAL] SECURITY INVOKER
   [EXTERNAL] SECURITY DEFINER

   SECURITY INVOKER indicates that the function is to be executed with
   the privileges of the user that calls it. That is the default.
   SECURITY DEFINER specifies that the function is to be executed with
   the privileges of the user that created it.
Dante

--
D. Dante Lorenso
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [GENERAL] How to navigate tree without CONNECT BY?

2003-12-19 Thread D. Dante Lorenso
Joe Conway wrote:

D. Dante Lorenso wrote:

I have a simple table that I'd like to query to pull
out a heirarchy from a tree relationship.  What is the
best way to do this without a 'CONNECT BY' clause like
Oracle has?


See connectby() in contrib/tablefunc.


Yep.  That's what I was looking for.  Had to upgrade to 7.4 and
then install the contrib stuff and import those functions into my
database.
But, what a pain in the butt.  I'd think this functionality
would be so important that it'd make it into the main source.
Kinda like MD5 did.
Thanks again.

Dante



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [GENERAL] Delete all records NOT referenced by Foreign Keys

2003-12-14 Thread D. Dante Lorenso
Martijn van Oosterhout wrote:

On Sat, Dec 13, 2003 at 09:48:16PM -0600, D. Dante Lorenso wrote:
 

This is something very ugly indeed and is what I'll have to resort to unless
I can find something cleaner.  Ideally, I would be able to run this cleanup
on a subset of the table data after an insert into the table.  I would like
the query to be fast, though. 
   

What about just:

delete from a where a.id not in (select id from b);

or the equivalent exists query.
 

You missed the previous part of the thread.  I have N tables that
have a foreign key to the table in question.  Tomorrow there may be
more or fewer foreign key references.  Without having to know which
tables have foreign keys on my table, I want to delete all rows
that are not used by any any other table.
PG already can block a delete when it knows that foreign key exists, so
why can't I perform a query that says...
   DELETE FROM tablename
   WHERE FOREIGN_KEY_EXISTS(oid) IS FALSE;
You see?  Something like what I seek never requires ME the developer or
DBA to know about foreign key relationships because I know that PostgreSQL
already does.
To NOT have this functionality does not cause problems, but it does cause
me to waste disk space on rows that are no longer in use.  I just want to
do some automated cleanup on tables and just leave that process running
in a crontab nightly or something.  I don't want to have to re-write the
cleanup process every time a new dependency is introduced or removed.
I think Bruno had a good idea about using the system tables to determine
relationships, but how to do that is beyond my PostgreSQL expertise at
the moment.  I just think there's gotta be an easier way, though...something
like what I describe above.
Dante

--
D. Dante Lorenso
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] Delete all records NOT referenced by Foreign Keys

2003-12-13 Thread D. Dante Lorenso
Bruno Wolff III wrote:

On Sat, Dec 13, 2003 at 02:20:15 -0600,
 D. Dante Lorenso [EMAIL PROTECTED] wrote:
 

I'd like to run a clean up command on my tables to
eliminate rows that I'm no longer using in the database.
I want to do something like this:

  DELETE FROM tablename
  WHERE IS_REFERENCED_BY_FOREIGN_KEY IS FALSE;
Does anyone know how something like this could be done
in PostgreSQL?  I know I can search all the tables that
I know refer to this table and see if my primary key
exists, but I want a solution that does not require me to
rewrite my code every time a new foreign key constraint
is added to the database.
There must be a way to ask PostgreSQL for a reference count
on a given row or something.
   

If you are more concerned about flexibility than speed you can do something
like the following:
Set all of your foreign key references to the desired table to use an
on delete restrict clause.
Have your application read all of the key values from the desired table
and for each key issue a delete of that key in its own transaction.
This will fail for keys that are referenced (because of the restrict clause).
 

This is something very ugly indeed and is what I'll have to resort to unless
I can find something cleaner.  Ideally, I would be able to run this cleanup
on a subset of the table data after an insert into the table.  I would like
the query to be fast, though. 

Does anyone know if there is any way to say something like:

   DELETE FROM tablename
   IGNORE ERRORS;
Where a delete that is possible is performed but ones that throw referencial
integrity voilations would silently fail without abandoning the entire
transaction?
I have the 'on delete restrict' clause on my foreign keys already.

A more complicated, less future proof, but more efficient approach would
be to have your application find out which tables have references to the
table of interest by looking at the system catalog and then write a
delete query using appropiate where not exist clauses.
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings
 



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster