Reinier Post wrote:
Dear fellow DBI users,
Recently I discovered (by asking on #perl, as usual) that SQL::Statement
supports joins. I like this, because it allows me to compute the
relational join of two CSV tables and store the result as a new CSV table.
Actually, I'd like to apply *all* relational algebra operations.
So the next thing I tried was the anti-join (see
http://en.wikipedia.org/wiki/Relational_algebra#Antijoin
) and here I failed: it does not appear to be supported.
SQL offers various ways to express it, but none of them I can think
of are supported in SQL::Statement:
- no EXCEPT (or UNION, for that matter)
- no NOT IN (or nested selects in general)
- no functions as column specifiers (to be exact: only in the parser)
Although I developed a workaround (and #perl told me it was probably
the best I could do) this left me with two questions:
+ Have I overlooked a way of expressing the anti-join in SQL::Statement?
+ Is there any interest (besides my own) in adding such functionality
to SQL::Statement if it isn't there?
I have an alternative solution to suggest:
You can use my Set::Relation module on CPAN.
http://search.cpan.org/dist/Set-Relation/
That module will directly give you all of the relational algebra directly and in
an easy to use form.
It just provides in memory objects so you'd use it together with your CSV module
like this:
1. Populate 2 lexical variables $rs1, $rs2 to contain your rowsets as DBI would
return them, as an array of hashes say; each one the contents of your CSV files.
2. Here's some example code that assumes the columns you match on have the same
names and all other columns have unique names:
use Set::Relation::V2;
sub relation { return Set::Relation::V2->new( @_ ); }
my $r1 = relation( $rs1 );
my $r2 = relation( $rs2 );
my $r3 = $r1->antijoin( $r2 );
my $rs3 = $r3->members();
3. Then $rs3 has the result rowsets in array-of-hashes form you can then use
your CSV module to write a file with.
If your source files don't have appropriately named columns for antijoin() to do
the right thing then use a rename() on either or both arguments first.
This module should work and has been tested to some degree, but it is officially
alpha quality so if something doesn't work then let me know and I'll fix it.
Limitations are that all your rowsets have to fit in memory at once and it would
probably be slower than a native SQL::Statement solution. But for batch tasks
or quick-get-it-done tasks that shouldn't be a problem. Requires Perl 5.8+.
-- Darren Duncan