[GENERAL] Calculating a hash / fingerprint for a row / tuple

2006-04-10 Thread Randall Lucas
I'm looking for a way to calculate a hashcode or fingerprint for a row /
tuple of arbitrary width.  The goal is to be able to store, in an audit
table, a fingerprint which could be compared against a later fingerprint
to detect changes (this application does not call for cryptographic
level security, so a simple checksum would be adequate for now). 
Something like:

SELECT id, row_hashcode(*) FROM my_table;

id   |   row_hashcode
 1a3843587d893589ef321d31
 2934983497f9083298cb0843

etc.

I've considered implementing it in a procedural language, but the
inability of passing in ROW datatypes seems to make it impossible for
dynamic tuples (I suppose for static tables it could be done by a
process of iteration over the system catalog column lists for a given
table and PK).  Also, for static tables, it seems like for a 1 .. 32
column table, creating a multicolumn index would have this effect, but I
can't seem to find a function to get at the calculated value of the index.

It seems like implementing this in C would be feasible, but I don't want
to reimplement it if it exists.  Merely concatenating the values would
fail for nulls, so it seems to me the best thing to do would be to take
the effectively-serialized COPY output and run MD5 (or your
less-collision-prone hash of choice) on that.

I've done a fairly thorough search of the docs, contrib, and mailing
list archives (although the frequency in other contexts of the terms
hash, fingerprint, and hashcode are lowering the S/N).  Does
anybody have any pointers or ideas either on existing implementations,
or writing a new C function (such as generating COPY output from an
arbitrary tuple)?

Best,

Randall

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Calculating a hash / fingerprint for a row / tuple

2006-04-10 Thread Tom Lane
Randall Lucas [EMAIL PROTECTED] writes:
 I'm looking for a way to calculate a hashcode or fingerprint for a row /
 tuple of arbitrary width.  The goal is to be able to store, in an audit
 table, a fingerprint which could be compared against a later fingerprint
 to detect changes (this application does not call for cryptographic
 level security, so a simple checksum would be adequate for now). 
 Something like:

 SELECT id, row_hashcode(*) FROM my_table;

 id   |   row_hashcode
  1a3843587d893589ef321d31
  2934983497f9083298cb0843

Try select md5(textin(record_out(mytbl.*))) from mytbl; ... a bit
klugy, but it works.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend