Under most circumstances, even with data tables as large as yours, I have
found that direct JOIN statements have provided acceptable performance. On
certain highly normalized databases, where the requirements of the query
needed more than 4 or 5 tables joined at once, I have also used your
"cache" table idea to great success.
However I would like to believe that queries on the relationships between
just these three tables could be coded by using JOINS to provide some very
fast results. Since you chose not to post your real table structures here,
I must create these sample queries using your sample tables.
select * from REC_A where (no links exist at all) or ((links to REC_B
exist
for both PKs 22 and 34) and (no links exist to REC_B for either PK in 56
or
12)
CREATE TEMPORARY TABLE tmpUnion (primary key(PK_A))
SELECT a.*
FROM REC_A a
LEFT JOIN LNK
on LNK.PK_A = a.PK_A
WHERE LNK.PK_A is null;
INSERT IGNORE tmpUnion
SELECT a.*
FROM REC_A a
INNER JOIN LNK
on LNK.PK_A = a.PK_A
WHERE LNK.PK_A in (22,34)
AND LNK.PK_A NOT IN (12,56);
SELECT * from tmpUnion;
DROP TEMPORARY TABLE tmpUnion;
Whenever you have an OR like that in your WHERE clause, you are in essence
running two or more queries and combining the results. Since you are on a
pre-4.0 server, you do not have the UNION operator available to help you
so I had to simulate one by using a temporary table. Basically I had to
hand-code what the UNION operator automates for you. I defined the Primary
Key on the temp table and used the IGNORE option on the INSERT statement
to prevent duplicate rows (logically you shouldn't have had any but I was
being thorough)
Here is an equivalent query without the temp table
SELECT DISTINCT a.*
FROM REC_A a
LEFT JOIN LNK m1
ON m1.PK_A = a.PK_A
INNER JOIN LNK m2
ON m2.PK_A = a.PK_A
WHERE m1.PK_A is null
OR (
m2.PK_A in (22,34)
AND m2.PK_A NOT IN (12,56)
);
"Daniel BODEA" <[EMAIL PROTECTED]> wrote on 10/26/2004 09:02:56 AM:
> Greetings,
>
> I have several solutions to a problem which requires temporary or cache
> tables but the different solutions are time consuming to test for
> performance and I was wondering if anyone had more information on this
than
> myself.
>
> The server is 3.23.40 without possibility for upgrade to 4.x because of
the
> amount of extra work required for testing, deployment, etc.
>
> The simplified context is that I have several tables for individual
records
> (REC_A, REC_B, etc.) with primary keys and one table for links between
> different types of records based on their primary keys (LNK). LNK is N
to N
> so there is no constraint on the number of links a record can have to
other
> records.
>
> I have to query these relationships in complex ways, one example being
> "select * from REC_A where (no links exist at all) or ((links to REC_B
exist
> for both PKs 22 and 34) and (no links exist to REC_B for either PK in 56
or
> 12))". The number of records is in the hundreds of thousands and the
links
> can get past a million.
>
> Now since this version does not support GROUP_CONCAT and having given up
> after a couple of days of trying to torture recursive INNER and LEFT
joins
> to filter out records based on totally incomprehensible ON clauses, I
figure
> I need temporary or cache intermediary tables.
>
> The solutions that I see would be to have (1) a generic "flag" table
that a
> complex query would use to first flag the records that should get
through
> and then join and filter with that table to return the list of records
but
> that would mean locking and updating the table for each complex query,
(2)
> generate an "alternate view" of the LNK table by simulating the
GROUP_CONCAT
> function to aggregate the records in unique rows that contain the full
set
> of links. The question for (2) is how to store the full link set in one
row
> so that queries be able to make use of indexes. (3) would be a variation
of
> (1) by creating a temporary table per complex query that would only
serve
> the purposes of this query and which would contain only those record
indexes
> that should go through so that the complex query can then perform a
simple
> INNER JOIN to filter out the results.
>
> I've also thought about retrieving the full set of indexes for all
matched
> records programmatically and then reinjecting this in the main query as
a PK
> IN (...) but that would give really huge index sets that would go back
and
> forth between the server and the client so I ruled this one out.
>
> Now unless there is another simpler and miraculous solution that I
haven't
> thought of, I have to find out which one of these alternatives is the
best
> one speed-wise so any help on this is sincerely appreciated.
>
> Daniel
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
>