[Dbix-class] Finding missing rows -- JOIN with multiple conditions

Bill Moseley Sat, 01 Jun 2013 16:29:41 -0700

After stumbling around for a while I realize I need some help with building
a query with DBIC.


After writing what follows I found that I want to use a JOIN with multiple
join conditions.

What I'm after is how to write a query in DBIC with a JOIN like what is
bold below.  I can't find anything in the docs that supports this usage
unless I'm not understanding the sub {} syntax for add_relationship()

 (the schema that follows will help this make sense)

    SELECT
         musician_band.musician, band.id as band, album.id as album

    FROM
         musician_band
         LEFT JOIN band ON  musician.band = band.id
         LEFT JOIN album ON album.band = band.id
         LEFT JOIN track ON track.album = album.id* AND  track.songwriter =
musician_band.musician*
         LEFT JOIN track_detail ON track_detail.id = track.id

    WHERE
         band.id in ( @bands )
             -- NULL thus means there's no track written by the musician on
that album.
            -- or the track was not included on the final album.
         AND (  track.id IS NULL  OR track_detail.is_on_album IS FALSE )




I have the typical music database.  I need to find musicians that played on
albums but where they were NOT the song writer on ANY track on that album.

For example, if a drummer (musician.id = 123) never writes songs and played
in two bands, and each band had three albums I'd want:

musician  |   band  |  album
----------+---------+---------
     123  |     22  |    45
     123  |     22  |    46
     123  |     22  |    47
     123  |     54  |    78
     123  |     54  |    92
     123  |     54  |   103

That musician played on 6 albums yet never was the songwriter for a track
on any of the albums.

My question is what is the best query to use, and how to represent that in
DBIC?



The tables are as you might expect:

A band has many albums, and albums have many tracks.   Each track has just
one songwriter which is a musician.

table musician ( id integer );

table band ( id integer );

table album (
    id integer,
    band integer REFERENCES band
);

table track (
    id integer,
    album integer REFERENCES album,
    songwriter integer REFERENCES musician
);

This can't be too easy so there's a table that manages additional details
about a track -- like if it is included on the final album.  Sorry, it's
just that way.

table track_detail (
    id integer REFERENCES track (id),
    is_on_album boolean default false
);


Of course, musicians are in many bands.  So, there's this many-to-many
table to associate musicians with bands:

table musician_band(
    musician integer REFERENCES musician,
    band integer REFERENCE band
);

Is it possible with just joins?    I thought I read DBIC could do custom
joins now.    This turns out to be *much faster* than the correlated
sub-query below.

That is, LEFT JOIN with tracks *specific to the songwriter* and check for
NULL

    SELECT
         musician_band.musician, musician.band, album.id as album

    FROM
         musician_band
         LEFT JOIN band ON  musician.band = band.id
         LEFT JOIN album ON album.band = band.id
         LEFT JOIN track ON track.album = album.id* AND  track.songwriter =
musician_band.musician*
         LEFT JOIN track_detail ON track_detail.id = track.id
    WHERE
         band in ( @bands )
         AND (
             track.id IS NULL  -- NULL thus means there's no track written
by the musician on that album.
             OR track_detail.is_on_album IS FALSE
         )




Here, I think, is 'NOT EXISTS ($sub_query)'  correlated sub-query.  But, it
seems to be much slower due to the join on track_detail.

# List of all tracks authored by the musician
my $sub_query = $schema->resultset( 'Track' )->search(
    {
        songwriter => { -ident => 'me.musician' },
        album      => { -ident => 'album.id' },
        'track_detail.is_on_album' => 1,
    },
    {
        columns => ['id'],
        alias => 'tracks_authored',
        join => 'track_detail',
    },
);


# List all musicians where they do not have an associated track

my $rs = $schema->resultset( 'MusicianBand' )->search(
    {
        'me.band' => { -in => \@bands },
        'NOT EXISTS => $sub_query->as_query,
    },
    {
        select => [qw/ me.musician me.band album.id /],
        as => [qw/ musician band album /],
        join => {
            band => 'album',
        },
    },
);

Wow,   Using the above correlated sub-query I see "Total runtime: 2396.274
ms".  If I remove the join track_detail JOIN in the subquery I get:  "Total
runtime: 0.229 ms"

If instead of the correlated sub-query approach I instead use the complex
JOIN it's then: "Total runtime: 0.357 ms".   Time to look at the query plan.



-- 
Bill Moseley
mose...@hank.org

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@lists.scsys.co.uk

[Dbix-class] Finding missing rows -- JOIN with multiple conditions

Reply via email to