Re: Querying on subsets of one-to-many and many-to-many

SGreen Tue, 26 Oct 2004 06:46:13 -0700

Under most circumstances, even with data tables as large as yours, I have 
found that direct JOIN statements have provided acceptable performance. On 
certain highly normalized databases, where the requirements of the query 
needed more than 4 or 5 tables joined at once, I have also used your 
"cache" table idea to great success.


However I would like to believe that queries on the relationships between 
just these three tables could be coded by using JOINS to provide some very 
fast results. Since you chose not to post your real table structures here, 
I must create these sample queries using your sample tables.

select * from REC_A where (no links exist at all) or ((links to REC_B 
exist
for both PKs 22 and 34) and (no links exist to REC_B for either PK in 56 
or
12)

CREATE TEMPORARY TABLE tmpUnion (primary key(PK_A))
SELECT a.*
FROM REC_A a
LEFT JOIN LNK 
        on LNK.PK_A = a.PK_A
WHERE LNK.PK_A is null;

INSERT IGNORE tmpUnion 
SELECT a.*
FROM REC_A a
INNER JOIN LNK
        on LNK.PK_A = a.PK_A
WHERE LNK.PK_A in (22,34)
        AND LNK.PK_A NOT IN (12,56);

SELECT * from tmpUnion;
DROP TEMPORARY TABLE tmpUnion;

Whenever you have an OR like that in your WHERE clause, you are in essence 
running two or more queries and combining the results. Since you are on a 
pre-4.0 server, you do not have the UNION operator available to help you 
so I had to simulate one by using a temporary table. Basically I had to 
hand-code what the UNION operator automates for you. I defined the Primary 
Key on the temp table and used the IGNORE option on the INSERT statement 
to prevent duplicate rows (logically you shouldn't have had any but I was 
being thorough)

Here is an equivalent query without the temp table

SELECT DISTINCT a.*
FROM REC_A a
LEFT JOIN LNK  m1
        ON m1.PK_A = a.PK_A
INNER JOIN LNK m2 
        ON m2.PK_A = a.PK_A
WHERE m1.PK_A is null
        OR (
                m2.PK_A in (22,34)
                AND m2.PK_A NOT IN (12,56)
        );


"Daniel BODEA" <[EMAIL PROTECTED]> wrote on 10/26/2004 09:02:56 AM:

> Greetings,
> 
> I have several solutions to a problem which requires temporary or cache
> tables but the different solutions are time consuming to test for
> performance and I was wondering if anyone had more information on this 
than
> myself.
> 
> The server is 3.23.40 without possibility for upgrade to 4.x because of 
the
> amount of extra work required for testing, deployment, etc.
> 
> The simplified context is that I have several tables for individual 
records
> (REC_A, REC_B, etc.) with primary keys and one table for links between
> different types of records based on their primary keys (LNK). LNK is N 
to N
> so there is no constraint on the number of links a record can have to 
other
> records.
> 
> I have to query these relationships in complex ways, one example being
> "select * from REC_A where (no links exist at all) or ((links to REC_B 
exist
> for both PKs 22 and 34) and (no links exist to REC_B for either PK in 56 
or
> 12))". The number of records is in the hundreds of thousands and the 
links
> can get past a million.
> 
> Now since this version does not support GROUP_CONCAT and having given up
> after a couple of days of trying to torture recursive INNER and LEFT 
joins
> to filter out records based on totally incomprehensible ON clauses, I 
figure
> I need temporary or cache intermediary tables.
> 
> The solutions that I see would be to have (1) a generic "flag" table 
that a
> complex query would use to first flag the records that should get 
through
> and then join and filter with that table to return the list of records 
but
> that would mean locking and updating the table for each complex query, 
(2)
> generate an "alternate view" of the LNK table by simulating the 
GROUP_CONCAT
> function to aggregate the records in unique rows that contain the full 
set
> of links. The question for (2) is how to store the full link set in one 
row
> so that queries be able to make use of indexes. (3) would be a variation 
of
> (1) by creating a temporary table per complex query that would only 
serve
> the purposes of this query and which would contain only those record 
indexes
> that should go through so that the complex query can then perform a 
simple
> INNER JOIN to filter out the results.
> 
> I've also thought about retrieving the full set of indexes for all 
matched
> records programmatically and then reinjecting this in the main query as 
a PK
> IN (...) but that would give really huge index sets that would go back 
and
> forth between the server and the client so I ruled this one out.
> 
> Now unless there is another simpler and miraculous solution that I 
haven't
> thought of, I have to find out which one of these alternatives is the 
best
> one speed-wise so any help on this is sincerely appreciated.
> 
> Daniel
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
>

Re: Querying on subsets of one-to-many and many-to-many

Reply via email to