Re: [GENERAL] md5 of table

2011-09-03 Thread Sim Zacks



I thought about using dblink and the EXCEPT query, but then I need to
know the field list of each query result, which is a pain in the butt.

That is not correct.  As long as the table definitions are precisely
the same, you can move records across dblink without specifying
fields.  You do this by using record type for the composite which
dblink sends across as text.

merlin

Do you have a quick example? This is what I have tried:

select * from tbla
except
select * from dblink('host=dbhost dbname=otherdb user=myuser
password=mypwd'::text, 'select * from tbla')

The error I get back is:
ERROR:  a column definition list is required for functions returning
record

sure:
select tbla from tbla
except
select t::tbla from dblink('host=dbhost dbname=otherdb user=myuser
password=mypwd'::text, 'select tbla::text from tbla') R(t text);
We tried something like that. Unfortunately, in 8.2 you can't cast a row 
type as text.



there's a bunch of ways to do that -- you can also do the md5 on the
remote side so you can just send the digests.
select * from tbla
except
select (t::tbla).* from dblink('host=dbhost dbname=otherdb user=myuser
password=mypwd'::text, 'select tbla::text from tbla') R(t text);

should also work.

This *might* work -- I didn't try.  It's been a while since I've used
stock dblink.
select * from tbla
except
select (t).* from dblink('host=dbhost dbname=otherdb user=myuser
password=mypwd'::text, 'select tbla from tbla') R(t tbla);

merlin


This looks like it might work for us. At least I would only need the 
table name for the field list instead of the entire column list.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] md5 of table

2011-09-02 Thread Vincent de Phily
On Thursday 01 September 2011 11:47:24 Sim Zacks wrote:
 Is there a way to get an md5 or other hash of an entire table?
 
 I want to be able to easily compare 2 tables in different databases.
 
 I thought about using dblink and the EXCEPT query, but then I need to
 know the field list of each query result, which is a pain in the butt.
 
 If I could return an md5 of the entire table, then I could check if the
 tables have the same hash and be confident enough that the tables were
 identical.
 
 
 Thanks
 Sim

You might also want to take a look at 

http://pgfoundry.org/projects/pg-comparator/

which can give a more nuanced view of db differences and tries to be smart 
about performance. It looks a bit stale; I haven't used it in ages, but it 
used to be a trusty part of our test suite.
-- 
Vincent de Phily

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] md5 of table

2011-09-01 Thread Sim Zacks

Is there a way to get an md5 or other hash of an entire table?

I want to be able to easily compare 2 tables in different databases.

I thought about using dblink and the EXCEPT query, but then I need to 
know the field list of each query result, which is a pain in the butt.


If I could return an md5 of the entire table, then I could check if the 
tables have the same hash and be confident enough that the tables were 
identical.



Thanks
Sim


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] md5 of table

2011-09-01 Thread Karsten Hilbert
On Thu, Sep 01, 2011 at 11:47:24AM +0300, Sim Zacks wrote:

 Is there a way to get an md5 or other hash of an entire table?
 
 I want to be able to easily compare 2 tables in different databases.
 
 I thought about using dblink and the EXCEPT query, but then I need to
 know the field list of each query result, which is a pain in the
 butt.
 
 If I could return an md5 of the entire table, then I could check if
 the tables have the same hash and be confident enough that the tables
 were identical.

One option might be to pg_dump in an appropriate format and
md5-compare the output ?

Another option might be to

- cross-check columns/column types
- query from information_schema
- compare row counts
- may need a lock
- compare table sizes
- may need vaccum ?

If all three match that may be good enough ?

Karsten
-- 
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] md5 of table

2011-09-01 Thread Achilleas Mantzios
md5 has size limitations, the second approach seems more practical.

Στις Thursday 01 September 2011 12:30:45 ο/η Karsten Hilbert έγραψε:
 On Thu, Sep 01, 2011 at 11:47:24AM +0300, Sim Zacks wrote:
 
  Is there a way to get an md5 or other hash of an entire table?
  
  I want to be able to easily compare 2 tables in different databases.
  
  I thought about using dblink and the EXCEPT query, but then I need to
  know the field list of each query result, which is a pain in the
  butt.
  
  If I could return an md5 of the entire table, then I could check if
  the tables have the same hash and be confident enough that the tables
  were identical.
 
 One option might be to pg_dump in an appropriate format and
 md5-compare the output ?
 
 Another option might be to
 
 - cross-check columns/column types
   - query from information_schema
 - compare row counts
   - may need a lock
 - compare table sizes
   - may need vaccum ?
 
 If all three match that may be good enough ?
 
 Karsten
 -- 
 GPG key ID E4071346 @ gpg-keyserver.de
 E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346
 



-- 
Achilleas Mantzios

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] md5 of table

2011-09-01 Thread Sim Zacks

On 09/01/2011 12:26 PM, Pavel Stehule wrote:

Hello

postgres=# create table tt(a int, b varchar);
CREATE TABLE
postgres=# insert into tt values(10,'hello');
INSERT 0 1

postgres=# select md5(array_to_string(array_agg(md5(tt::text)),'')) from tt;
md5
--
  20a92a676f52699e613da1bb114bd6f0
(1 row)

Regards

Pavel Stehule

Would be perfect, but 8.2 can't cast a UDT to text.
ERROR:  cannot cast type tt to text
LINE 1: select tt::text from tt

Thanks
Sim

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] md5 of table

2011-09-01 Thread Grzegorz Jaśkiewicz
On Thu, Sep 1, 2011 at 11:14 AM, Sim Zacks s...@compulab.co.il wrote:
 On 09/01/2011 12:26 PM, Pavel Stehule wrote:

 Hello

 postgres=# create table tt(a int, b varchar);
 CREATE TABLE
 postgres=# insert into tt values(10,'hello');
 INSERT 0 1

 postgres=# select md5(array_to_string(array_agg(md5(tt::text)),'')) from
I do that as well, but it might have questionable performance when
your table has 16M rows, and is 50GB +


-- 
GJ

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] md5 of table

2011-09-01 Thread Sim Zacks


  
  


  
I am not sure if this will work, but you can try it

http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks#Cast_to_varchar

Pavel



I appreciate your help, but UDTs don't have input/ouput functions
unless you define them manually and I need this for all of my
tables. 

Thanks
Sim
  



Re: [GENERAL] md5 of table

2011-09-01 Thread Sim Zacks


  
  
On 09/01/2011 01:35 PM, Grzegorz Jaśkiewicz wrote:

  On Thu, Sep 1, 2011 at 11:14 AM, Sim Zacks s...@compulab.co.il wrote:

  
On 09/01/2011 12:26 PM, Pavel Stehule wrote:


  
Hello

postgres=# create table tt(a int, b varchar);
CREATE TABLE
postgres=# insert into tt values(10,'hello');
INSERT 0 1

postgres=# select md5(array_to_string(array_agg(md5(tt::text)),'')) from


  
  I do that as well, but it might have questionable performance when
your table has 16M rows, and is 50GB +


I don't need performance. This is for regression testing for a
  new database version. I want to run my functions in the old db and
  the new db and when it modifies a table, I want to be able to
  check that the tables are the same.
Sim

  



Re: [GENERAL] md5 of table

2011-09-01 Thread Scott Marlowe
On Thu, Sep 1, 2011 at 3:48 AM, Achilleas Mantzios
ach...@matrix.gatewaynet.com wrote:
 md5 has size limitations, the second approach seems more practical.

Really?  I was not aware of size limits of md5, what are they?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] md5 of table

2011-09-01 Thread Achilleas Mantzios
Στις Thursday 01 September 2011 15:50:21 ο/η Scott Marlowe έγραψε:
 On Thu, Sep 1, 2011 at 3:48 AM, Achilleas Mantzios
 ach...@matrix.gatewaynet.com wrote:
  md5 has size limitations, the second approach seems more practical.
 
 Really?  I was not aware of size limits of md5, what are they?
 

sorry, i was wrong. i dont know why i had this impression,
just checked with a 43GB table on a freebsd machine and went fine.

-- 
Achilleas Mantzios

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] md5 of table

2011-09-01 Thread Merlin Moncure
2011/9/1 Grzegorz Jaśkiewicz gryz...@gmail.com:
 On Thu, Sep 1, 2011 at 11:14 AM, Sim Zacks s...@compulab.co.il wrote:
 On 09/01/2011 12:26 PM, Pavel Stehule wrote:

 Hello

 postgres=# create table tt(a int, b varchar);
 CREATE TABLE
 postgres=# insert into tt values(10,'hello');
 INSERT 0 1

 postgres=# select md5(array_to_string(array_agg(md5(tt::text)),'')) from
 I do that as well, but it might have questionable performance when
 your table has 16M rows, and is 50GB +

you need order by for that to work.  I would do it like this:
select md5(array(select foo from foo order by foo_pkey)::text);

it's great quick'n'dirty, but not much scalable beyond millions.

OP:
 I thought about using dblink and the EXCEPT query, but then I need to know 
 the field list of each query result, which is a pain in the butt.

That is not correct.  As long as the table definitions are precisely
the same, you can move records across dblink without specifying
fields.  You do this by using record type for the composite which
dblink sends across as text.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] md5 of table

2011-09-01 Thread Sim Zacks



OP:

I thought about using dblink and the EXCEPT query, but then I need to know the 
field list of each query result, which is a pain in the butt.

That is not correct.  As long as the table definitions are precisely
the same, you can move records across dblink without specifying
fields.  You do this by using record type for the composite which
dblink sends across as text.

merlin


Do you have a quick example? This is what I have tried:

select * from tbla
except
select * from dblink('host=dbhost dbname=otherdb user=myuser 
password=mypwd'::text, 'select * from tbla')


The error I get back is:
ERROR:  a column definition list is required for functions returning 
record


Sim


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] md5 of table

2011-09-01 Thread Robert Treat
2011/9/1 Merlin Moncure mmonc...@gmail.com:
 2011/9/1 Grzegorz Jaśkiewicz gryz...@gmail.com:
 On Thu, Sep 1, 2011 at 11:14 AM, Sim Zacks s...@compulab.co.il wrote:
 On 09/01/2011 12:26 PM, Pavel Stehule wrote:

 Hello

 postgres=# create table tt(a int, b varchar);
 CREATE TABLE
 postgres=# insert into tt values(10,'hello');
 INSERT 0 1

 postgres=# select md5(array_to_string(array_agg(md5(tt::text)),'')) from
 I do that as well, but it might have questionable performance when
 your table has 16M rows, and is 50GB +

 you need order by for that to work.  I would do it like this:
 select md5(array(select foo from foo order by foo_pkey)::text);

 it's great quick'n'dirty, but not much scalable beyond millions.

I've always liked doing this with my pager:

 [robert@client-168] export PAGER=md5
-=[11:40:25 Thu Sep 01]=---=[ pagila-0.10.1 ]=-
 [robert@client-168] psql -hlocalhost -dpagila
psql (9.0.4, server 9.1beta3)
WARNING: psql version 9.0, server version 9.1.
 Some psql features might not work.
Type help for help.

pagila=# select * from actor order by actor_id;
f381ebdefe0aada9c0bc14e657962c1f


Robert Treat
conjecture: xzilla.net
consulting: omniti.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] md5 of table

2011-09-01 Thread Merlin Moncure
2011/9/1 Sim Zacks s...@compulab.co.il:

 OP:

 I thought about using dblink and the EXCEPT query, but then I need to
 know the field list of each query result, which is a pain in the butt.

 That is not correct.  As long as the table definitions are precisely
 the same, you can move records across dblink without specifying
 fields.  You do this by using record type for the composite which
 dblink sends across as text.

 merlin

 Do you have a quick example? This is what I have tried:

 select * from tbla
 except
 select * from dblink('host=dbhost dbname=otherdb user=myuser
 password=mypwd'::text, 'select * from tbla')

 The error I get back is:
 ERROR:  a column definition list is required for functions returning
 record

sure:
select tbla from tbla
except
select t::tbla from dblink('host=dbhost dbname=otherdb user=myuser
password=mypwd'::text, 'select tbla::text from tbla') R(t text);

there's a bunch of ways to do that -- you can also do the md5 on the
remote side so you can just send the digests.
select * from tbla
except
select (t::tbla).* from dblink('host=dbhost dbname=otherdb user=myuser
password=mypwd'::text, 'select tbla::text from tbla') R(t text);

should also work.

This *might* work -- I didn't try.  It's been a while since I've used
stock dblink.
select * from tbla
except
select (t).* from dblink('host=dbhost dbname=otherdb user=myuser
password=mypwd'::text, 'select tbla from tbla') R(t tbla);

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] md5 of table

2011-09-01 Thread Scott Marlowe
On Thu, Sep 1, 2011 at 7:56 AM, Achilleas Mantzios
ach...@matrix.gatewaynet.com wrote:
 Στις Thursday 01 September 2011 15:50:21 ο/η Scott Marlowe έγραψε:
 Really?  I was not aware of size limits of md5, what are they?


 sorry, i was wrong. i dont know why i had this impression,
 just checked with a 43GB table on a freebsd machine and went fine.

Well, it might have been an older version or a 32 bit version or
something you were thinking of.  Sometimes a long memory helps us get
the wrong answer. :)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general