Hello community, here is the log from the commit of package pg_comparator for openSUSE:Factory checked in at 2014-08-20 10:51:16 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/pg_comparator (Old) and /work/SRC/openSUSE:Factory/.pg_comparator.new (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "pg_comparator" Changes: -------- --- /work/SRC/openSUSE:Factory/pg_comparator/pg_comparator.changes 2014-04-30 15:10:14.000000000 +0200 +++ /work/SRC/openSUSE:Factory/.pg_comparator.new/pg_comparator.changes 2014-08-20 10:51:48.000000000 +0200 @@ -1,0 +2,6 @@ +Tue Aug 5 07:17:07 UTC 2014 - [email protected] + +- New Version 2.2.5 +- Minor Update & Bug fixes + +------------------------------------------------------------------- Old: ---- pg_comparator-2.2.3.tgz New: ---- pg_comparator-2.2.5.tgz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ pg_comparator.spec ++++++ --- /var/tmp/diff_new_pack.SsfvsH/_old 2014-08-20 10:51:50.000000000 +0200 +++ /var/tmp/diff_new_pack.SsfvsH/_new 2014-08-20 10:51:50.000000000 +0200 @@ -22,14 +22,14 @@ %define pgdocdir %{expand:%%(/usr/bin/pg_config --docdir)} Name: pg_comparator -Version: 2.2.3 +Version: 2.2.5 Release: 1%{?dist} Summary: A tool to compare and sync tables in different locations License: BSD-3-Clause Group: Development/Libraries Url: http://pgfoundry.org/projects/pg-comparator/ -Source: http://pgfoundry.org/frs/download.php/3643/pg_comparator-%{version}.tgz +Source: http://pgfoundry.org/frs/download.php/3661/pg_comparator-%{version}.tgz BuildRoot: %{_tmppath}/%{name}-%{version}-build ++++++ pg_comparator-2.2.3.tgz -> pg_comparator-2.2.5.tgz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/pg_comparator-2.2.3/pg_comparator new/pg_comparator-2.2.5/pg_comparator --- old/pg_comparator-2.2.3/pg_comparator 2014-04-19 17:11:43.000000000 +0200 +++ new/pg_comparator-2.2.5/pg_comparator 2014-07-24 10:21:30.000000000 +0200 @@ -1,6 +1,6 @@ #!/usr/bin/perl # -# $Id: pg_comparator.pl 1494 2014-04-19 15:11:43Z coelho $ +# $Id: pg_comparator.pl 1512 2014-07-24 08:21:31Z coelho $ # # HELP 1: pg_comparator --man # HELP 2: pod2text pg_comparator @@ -191,7 +191,7 @@ Maximum relative search effort. The search is stopped if the number of results is above this threshold expressed relatively to the table size. -Use 2.0 for no limit (all tuples were deleted and new one are inserted). +Use 2.0 for no limit (all tuples were deleted and new ones are inserted). Default is B<0.1>, i.e. an overall 10% difference is allowed before giving up. @@ -203,7 +203,7 @@ the table size is known. Default is to compute the maximum number of reported differences based on -the C<--max-ratio> option. +the C<--max-ratio> option, with a mimimum of 100 differences allowed. =item C<--max-levels=0> @@ -227,6 +227,11 @@ Show option summary. +=item C<--pg-copy=128> + +Experimental option to use PostgreSQL's COPY instead of INSERT/UPDATE +when synchronizing, by chunks of the specified size. + =item C<--prefix='pgc_cmp'> Name prefix, possibly schema qualified, used for generated comparison tables @@ -282,8 +287,8 @@ =item C<--synchronize> or C<-S> Actually perform operations to synchronize the second table wrt the first. -Well, not really. It is only done if you add C<--do-it> or C<-D>. -Save your data before attempting anything like that! +Well, not really, it is only a dry run. It is actually done if you add +C<--do-it> or C<-D>. Save your data before attempting anything like that! Default is not to synchronize. @@ -394,6 +399,7 @@ Note that some default value used by DBI drivers may be changed with driver-specific environment variables, and that DBI also provides its own defaults and overrides, so what actually happens may not always be clear. +Default values for the second URL are mostly taken from the first URL. =over 4 @@ -1024,11 +1030,12 @@ key columns, number of value columns, aggregate function, checksum function, null handling, folding factor, table locking or not...). -=item I<feature> - about 5 minutes & 168 or 474 runs +=item I<feature> - about 5 minutes & 171 or 477 runs Test various features: I<cc> for checksum computation strategies, I<auto> for trigger-maintained checksums on PostgreSQL, +I<pgcopy> for PostgreSQL copy test, I<empty> for corner cases with empty tables, I<quote> for table quoting, I<engine> for InnoDB vs MyISAM MySQL backends, @@ -1038,7 +1045,7 @@ I<mylite> for SQLite/MySQL mixed mode with some restrictions, I<pglite> for SQLite/PostgreSQL mixed mode with some restrictions. -=item I<release> - about 20 minutes & 938 runs +=item I<release> - about 20 minutes & 944 runs This is I<feature> with two table sizes, I<fast>, and I<collisions> to test possible hash collisions. @@ -1091,6 +1098,11 @@ Mixed SQLite vs PostgreSQL or MySQL table comparison may not work properly in all cases, because of SQLite dynamic type handling and reduced capabilities. +The script creates (temporary) tables on both sides for comparing the target +tables: this imply that you must be allowed to do that for the comparison... +However, read-only replicas do not allow creating objects, which mean that you +cannot use pg_comparator to compare table contents on a synchronized replica. + =head1 VERSIONS See L<PG Foundry|http://pgfoundry.org/projects/pg-comparator/> for the latest @@ -1098,6 +1110,27 @@ =over 4 +=item B<version 2.2.5> (r1512 on 2014-07-24) + +Fix broken URL defaults to use UNIX sockets with an empty host name, +per report by I<Ivan Mincik>. +Fix C<--where> condition handling with C<--pg-copy> in corner cases. +Do not take execution timestamps when not required. +Allow a larger number of differences by default for small table comparisons. +Add more sanity checks. +Improve some error messages. +The I<release> validation was run successfully +on PostgreSQL 9.4b1 and MySQL 5.5.38. + +=item B<version 2.2.4> (r1506 on 2014-07-13) + +Add experimental support for using COPY instead of INSERT/UPDATE for PostgreSQL, +in chunks of size specified with option C<--pg-copy>, +as suggested by I<Graeme Bell>. +Minor fix when computing the maximum number of differences to report. +The I<release> validation was run successfully +on PostgreSQL 9.4b1 and MySQL 5.5.37. + =item B<version 2.2.3> (r1494 on 2014-04-19) Improved documentation. @@ -1360,8 +1393,8 @@ =cut -my $script_version = '2.2.3 (r1494)'; -my $revision = '$Revision: 1494 $'; +my $script_version = '2.2.5 (r1512)'; +my $revision = '$Revision: 1512 $'; $revision =~ tr/0-9//cd; ################################################################# SOME DEFAULTS @@ -1375,7 +1408,7 @@ my ($factor, $expect_warn) = (7, 0); # condition, tests, max size of blobs, data sources... my ($expect, $longreadlen, $source1, $source2, $key_cs, $tup_cs, $do_lock, - $env_pass, $max_report, $stats); + $env_pass, $max_report, $stats, $pg_copy); # algorithm defaults # hmmm... could rely on base64 to handle binary keys? @@ -1894,16 +1927,14 @@ my ($db, $user, $pass, $host, $port, $base, $tabl, $keys, $cols); # get driver name - if ($c =~ /^(pg|my)(sql)?:\/\//) { - $db = $1 . 'sql'; - } - elsif ($c =~ /^(sqlite|firebird):\/\//) { + if ($c =~ /^(\w+):\/\//) { $db = $1; + $db .= 'sql' if $db eq 'pg' or $db eq 'my'; # allow pg & my for pgsql & mysql + $c =~ s/^\w+:\/\///; # remove driver part } else { - verb 2, "no driver found in URL: $c" if $debug; + verb 2, "no driver in URL: $c" if $debug; } - $c =~ s/^\w+:\/\///; # split authority and path on first '/' die "invalid connection string '$c', must contain '\/'\n" @@ -1911,27 +1942,25 @@ my ($auth, $path) = ($1, $2); - if ("$auth") - { + if ("$auth") { # parse authority if non empty. ??? url-translation? die "invalid authority string '$auth'\n" unless $auth =~ /^((\w+) # login (:([^.]*) # :password - )?\@)? # @ + )?\@)? # @ => auth string is before ([^\@:\/]*) # host (:(\d+))?$ # :port /x; $user=$2 if defined $1; $pass=$4 if defined $3; - $host=$5; # may be empty, but must be defined! + $host=$5; # may be empty, but is always defined *if* there is a non empty auth $port=$7 if defined $6; verb 3, "user=$user pass=$pass host=$host port=" . defined $port? $port: '?' if $debug; } - if ("$path") - { + if ("$path") { my $kc_str; if (defined $db and ($db eq 'sqlite' or $db eq 'firebird')) { @@ -1939,7 +1968,7 @@ # if so, the last "/" is mandatory to mark the table name die "invalid path string '$path'\n" unless $path =~ / - ^((.*) # base file path + ^((.*) # base file (longest) path \/(\w+|\"[^\"+]\")?)? # table (\?(.+))? # key,part:column,list... /x; @@ -2844,7 +2873,8 @@ "stats-name=s" => \$name, # name of test # misc "long-read-len|lrl|L=i" => \$longreadlen, - "version|V" => sub { print "$0 version is $script_version\n"; exit 0; } + "version|V" => sub { print "$0 version is $script_version\n"; exit 0; }, + "pg-copy:i" => \$pg_copy ) or die "$! (try $0 --help)"; # propagate expect specification @@ -2878,6 +2908,9 @@ $factor = 1 if $factor<1; $factor = 30 if $factor>30; +# use pg_copy if possible, currently for inserts +$pg_copy = 128 if defined $pg_copy and ($pg_copy eq '' or $pg_copy eq '0'); + # intermediate table names # what about putting the table name as well? my ($name1, $name2) = ("${prefix}_1_", "${prefix}_2_"); @@ -2895,10 +2928,11 @@ # set defaults and check minimum definitions. $db1 = 'pgsql' unless defined $db1; $u1 = $ENV{USER} unless defined $u1; -$h1 = 'localhost' unless defined $h1; +$h1 = '' unless defined $h1; # defaults to Unix socket $p1 = $M{$db1}{port} if not defined $p1 and exists $M{$db1}{port}; +# k/c defaults set later -# these are necessary +# these are obviously necessary:-) die "no base on first connection" unless defined $b1 or defined $source1; die "no table on first connection" unless defined $t1 or defined $source1; @@ -2909,13 +2943,16 @@ if defined $db2 and $db2 eq 'sqlite' and (defined $u2 or defined $h2 or defined $p2); -# fix some default values for connection 2 +# default values for connection 2 is mostly to reuse from connection 1 $db2 = $db1 unless defined $db2; $u2 = $u1 unless defined $u2; -$h2 = 'localhost' unless defined $h2; -$p2 = $M{$db2}{port} if not defined $p2 and exists $M{$db2}{port}; +$h2 = $h1 unless defined $h2; +# same as fist iff same driver, or driver default +$p2 = ($db2 eq $db1)? $p1: $M{$db2}{port} + unless defined $p2 and exists $M{$db2}{port}; $b2 = $b1 unless defined $b2; $t2 = $t1 unless defined $t2; +# k/c defaults set later die "null should be 'text' or 'hash', got $null" unless $null =~ /^(text|hash)$/i; @@ -2930,13 +2967,11 @@ unless $agg =~ /^(xor|sum)$/i; # database connection... -if (defined $env_pass and not defined $w1) -{ +if (defined $env_pass and not defined $w1) { $w1 = $ENV{"${env_pass}1"}; $w1 = $ENV{$env_pass} unless defined $w1; } -if ($ask_pass and not defined $w1) -{ +if ($ask_pass and not defined $w1) { require Term::ReadPassword; $w1 = Term::ReadPassword::read_password('connection 1 password> '); } @@ -2947,21 +2982,29 @@ $w2 = $w1 unless $w2 or not $w1 or $u1 ne $u2 or $h1 ne $h2 or $p1 ne $p2; -if (defined $env_pass and not defined $w2) -{ +if (defined $env_pass and not defined $w2) { $w2 = $ENV{"${env_pass}2"}; $w2 = $ENV{$env_pass} unless defined $w2; } -if ($ask_pass and not defined $w2) -{ +if ($ask_pass and not defined $w2) { require Term::ReadPassword; $w2 = Term::ReadPassword::read_password('connection 2 password> '); } -# some sanity checks, that are skipped under debugging so as to test +# some sanity checks +die "sorry, --pg-copy option requires connections to postgresql" + if defined $pg_copy and ($db1 ne 'pgsql' or $db2 ne 'pgsql'); + +die "--pg_copy must be strictly positive, got '$pg_copy'" + if defined $pg_copy and $pg_copy <= 0; + +# sanity check skipped under debugging so as to test die "sorry, threading does not seem to work with PostgreSQL driver" if not $debug and $threads and ($db1 eq 'pgsql' or $db2 eq 'pgsql'); +die "sorry, --pg-copy currently requires --no-async" + if not $debug and defined $pg_copy and $async; + # fix some settings for SQLite if (not $debug and ($db1 eq 'sqlite' or $db2 eq 'sqlite')) { @@ -3149,25 +3192,27 @@ !$synchronize); } -# set defaults... -if (not defined $k1) -{ +# get/set k/c defaults once connected +if (not defined $k1) { $k1 = [get_table_pkey($dbh1, $db1, $b1, $t1)]; warn "default key & attribute on first connection but not on second..." if defined $k2; + die "no primary key found on first connection table $t1" unless @$k1; } -if (not defined $c1) -{ +if (not defined $c1) { $c1 = [get_table_attributes($dbh1, $db1, $b1, $t1, @$k1)]; # warn, as this may lead to unexpected results... warn "default attributes on first connection but not on second..." if defined $c2; } +# fix second connection default $k2 = $k1 unless defined $k2; $c2 = $c1 unless defined $c2; # some sanity checks +die "empty key on first connection, must specify one" unless @$k1; +die "empty key on second connection, must specify one" unless @$k2; die "key number of attributes does not match" unless @$k1 == @$k2; die "column number of attributes does not match" unless @$c1 == @$c2; @@ -3282,11 +3327,16 @@ $size = $count1>$count2? $count1: $count2; # MAX size of both tables # stop at this number of differences -$max_report = $max_ratio * $size unless defined $max_report; +if (not (defined $max_report or $expect_warn and defined $expect)) { + $max_report = int($max_ratio * $size); + # bee cool with small stuff... + $max_report = 100 if $max_report < 100; +} # can we already stop now? my $min_diff = abs($count2-$count1); -die "too many differences, at least $min_diff > $max_report" +die "too many differences, at least $min_diff > $max_report, " . + "consider raising --max-ratio or --max-report" if defined $max_report and $min_diff>$max_report; # compute initial "full" masks which must be larger than size @@ -3461,82 +3511,109 @@ my $where_k2 = is_equal($dbh2, $dhpbt2, $db2, $k2); my $set_c2 = (join '=?, ', @$c2) . '=?'; - # delete rows - if (@$del or @$delb) + # DELETE rows, including updates with copy + if (@$del or @$delb or ($pg_copy and @$upt)) { my $del_sql = "DELETE FROM $t2 WHERE " . ($where? "($where) AND ": '') . $where_k2; verb 2, $del_sql; my $del_sth = $dbh2->prepare($del_sql) if $do_it; - for my $d (@$del, @$delb) { + for my $d (@$del, @$delb, $pg_copy? @$upt: ()) { sth_param_exec($do_it, "DELETE $t2", $del_sth, $d); } # undef $del_sth; } - # get values for insert or update - my ($val_sql, $val_sth); - if ($c1 and @$c1) - { - $val_sql = "SELECT " . join(',', @$c1) . " FROM $t1 WHERE " . - ($where? "($where) AND ": '') . $where_k1; - verb 2, $val_sql; - $val_sth = $dbh1->prepare($val_sql) - if @$ins or @$insb or @$upt; + # insert/update rows + # note: I could skip fetching if there is no data column + if ($pg_copy and (@$ins or @$upt or defined $insb)) { # use COPY + sql_do($dbh2, $db2, "COPY $t2(" . join(',', @$k2, @$c2) . ") FROM STDIN"); + #async_wait($dbh2, $db2, 'copy from 2') if $async; + my $select = "SELECT " . join(',', @$k1, @$c1) . " FROM $t1 WHERE "; + $select .= "($where) AND " if $where; + $select .= "(" . join(',', @$k1) . ") IN ("; + # we COPY both inserts and updates + my @allins = (@$ins, @$insb, @$upt); + while (@allins) { + my $bulk = ''; + for my $k (splice(@allins, 0, $pg_copy)) { # chunked + $bulk .= ',' if $bulk; + #$copy_bulk .= $dbh1->quote($k); + $bulk .= "(@$k)"; + $query_data++; + } + sql_do($dbh1, $db1, "COPY ($select$bulk)) TO STDOUT"); + #async_wait($dbh1, $db1, 'copy to 1') if $async; + my $row = ''; + while (($dbh1->pg_getcopydata($row)) != -1) { + $dbh2->pg_putcopydata($row) if $do_it; + } + } + $dbh2->pg_putcopyend(); } + else { # use generic INSERT/UPDATE - # insert rows - if (@$ins or @$insb) - { - my $ins_sql = "INSERT INTO $t2(" . - (@$c2? join(',', @$c2) . ',': '') . join(',', @$k2) . ') ' . - 'VALUES(?' . ',?' x (@$k2+@$c2-1) . ')'; - verb 2, $ins_sql; - my $ins_sth = $dbh2->prepare($ins_sql) if $do_it; - for my $i (@$ins, @$insb) + # get values for insert or update + my ($val_sql, $val_sth); + if ($c1 and @$c1) { - $query_data++; - my @c1values = (); - # query the other column values for key $i - if ($c1 and @$c1) - { - sth_param_exec(1, "SELECT $t1", $val_sth, $i); - @c1values = $val_sth->fetchrow_array(); - # hmmm... may be raised on blobs? - die "unexpected values fetched for insert" - unless @c1values and @c1values == @$c1; + $val_sql = "SELECT " . join(',', @$c1) . " FROM $t1 WHERE " . + ($where? "($where) AND ": '') . $where_k1; + verb 2, $val_sql; + $val_sth = $dbh1->prepare($val_sql) + if @$ins or @$insb or @$upt; + } - &{$M{$db1}{close_cursor}}($val_sth) if exists $M{$db1}{close_cursor}; + # handle inserts + if (@$ins or @$insb) + { + my $ins_sql = "INSERT INTO $t2(" . join(',', @$c2, @$k2) . ") " . + 'VALUES(?' . ',?' x (@$k2+@$c2-1) . ')'; + verb 2, $ins_sql; + my $ins_sth = $dbh2->prepare($ins_sql) if $do_it; + for my $i (@$ins, @$insb) { + $query_data++; + my @c1values = (); + # query the other column values for key $i + if ($c1 and @$c1) { + sth_param_exec(1, "SELECT $t1", $val_sth, $i); + @c1values = $val_sth->fetchrow_array(); + # hmmm... may be raised on blobs? + die "unexpected values fetched for insert" + unless @c1values and @c1values == @$c1; + + &{$M{$db1}{close_cursor}}($val_sth) if exists $M{$db1}{close_cursor}; + } + # then insert the missing tuple + sth_param_exec($do_it, "INSERT $t2", $ins_sth, $i, @c1values); } - # then insert the missing tuple - sth_param_exec($do_it, "INSERT $t2", $ins_sth, $i, @c1values); + # $ins_sth } - # $ins_sth - } - # update rows - if (@$upt) - { - die "there must be some columns to update" unless $c1; - my $upt_sql = "UPDATE $t2 SET $set_c2 WHERE " . - ($where? "($where) AND ": '') . $where_k2; - verb 2, $upt_sql; - my $upt_sth = $dbh2->prepare($upt_sql) if $do_it; - for my $u (@$upt) + # handle updates + if (@$upt) { - $query_data++; - # get value for key $u - sth_param_exec(1, "SELECT $t1", $val_sth, $u); - my @c1values = $val_sth->fetchrow_array(); - # hmmm... may be raised on blobs? - die "unexpected values fetched for update" + die "there must be some columns to update" unless $c1; + my $upt_sql = "UPDATE $t2 SET $set_c2 WHERE " . + ($where? "($where) AND ": '') . $where_k2; + verb 2, $upt_sql; + my $upt_sth = $dbh2->prepare($upt_sql) if $do_it; + for my $u (@$upt) + { + $query_data++; + # get value for key $u + sth_param_exec(1, "SELECT $t1", $val_sth, $u); + my @c1values = $val_sth->fetchrow_array(); + # hmmm... may be raised on blobs? + die "unexpected values fetched for update" unless @c1values and @c1values == @$c1; - # use it to update the other table - sth_param_exec($do_it, "UPDATE $t2", $upt_sth, $u, @c1values); + # use it to update the other table + sth_param_exec($do_it, "UPDATE $t2", $upt_sth, $u, @c1values); - &{$M{$db1}{close_cursor}}($val_sth) if exists $M{$db1}{close_cursor}; + &{$M{$db1}{close_cursor}}($val_sth) if exists $M{$db1}{close_cursor}; + } + # $upt_sth } - # $upt_sth } # close synchronization transaction if any @@ -3557,7 +3634,7 @@ unless $do_it; } -$tsyn = [gettimeofday]; +$tsyn = [gettimeofday] if $stats; if ($clear) { @@ -3580,7 +3657,7 @@ verb 4, "clearing done." } -$tclr = [gettimeofday]; +$tclr = [gettimeofday] if $stats; # recreate database handler for the end... dbh_materialize($dbh1, $db1); @@ -3607,7 +3684,7 @@ } # final timestamp -$tend = [gettimeofday]; +$tend = [gettimeofday] if $stats; # some stats are collected out of time measures if ($stats) @@ -3644,6 +3721,7 @@ # build options as a bit vector my $options = + (($pg_copy?1:0) << 11) | # --pg-copy=... (($tup_cs?1:0) << 10) | # --tuple-checksum=... (($key_cs?1:0) << 9) | # --key-checksum=... ($do_lock << 8) | # --lock -- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
