Reinier Post wrote:
  Dear fellow DBI users,

Recently I discovered (by asking on #perl, as usual) that SQL::Statement
supports joins.  I like this, because it allows me to compute the
relational join of two CSV tables and store the result as a new CSV table.

Actually, I'd like to apply *all* relational algebra operations.
So the next thing I tried was the anti-join (see

  http://en.wikipedia.org/wiki/Relational_algebra#Antijoin

) and here I failed: it does not appear to be supported.
SQL offers various ways to express it, but none of them I can think
of are supported in SQL::Statement:

 - no EXCEPT (or UNION, for that matter)
 - no NOT IN (or nested selects in general)
 - no functions as column specifiers (to be exact: only in the parser)

Although I developed a workaround (and #perl told me it was probably
the best I could do) this left me with two questions:

 + Have I overlooked a way of expressing the anti-join in SQL::Statement?

 + Is there any interest (besides my own) in adding such functionality
   to SQL::Statement if it isn't there?

I have an alternative solution to suggest:

You can use my Set::Relation module on CPAN.

  http://search.cpan.org/dist/Set-Relation/

That module will directly give you all of the relational algebra directly and in an easy to use form.

It just provides in memory objects so you'd use it together with your CSV module like this:

1. Populate 2 lexical variables $rs1, $rs2 to contain your rowsets as DBI would return them, as an array of hashes say; each one the contents of your CSV files.

2. Here's some example code that assumes the columns you match on have the same names and all other columns have unique names:

    use Set::Relation::V2;
    sub relation { return Set::Relation::V2->new( @_ ); }

    my $r1 = relation( $rs1 );
    my $r2 = relation( $rs2 );

    my $r3 = $r1->antijoin( $r2 );

    my $rs3 = $r3->members();

3. Then $rs3 has the result rowsets in array-of-hashes form you can then use your CSV module to write a file with.

If your source files don't have appropriately named columns for antijoin() to do the right thing then use a rename() on either or both arguments first.

This module should work and has been tested to some degree, but it is officially alpha quality so if something doesn't work then let me know and I'll fix it.

Limitations are that all your rowsets have to fit in memory at once and it would probably be slower than a native SQL::Statement solution. But for batch tasks or quick-get-it-done tasks that shouldn't be a problem. Requires Perl 5.8+.

-- Darren Duncan

Reply via email to