Michael, Thanks for your suggestion. Here's the query that seems to have fixed the problem, without a DISTINCT clause:
select Documents.ID,
Name,
max(DownloadLog.AddedOn) as DownloadedOn
from Documents,
DocumentFiles,
DownloadLog
where Documents.ID = DocumentFiles.Document and
DocumentFiles.ID = DownloadLog.DocumentFile
group by Documents.ID
order by DownloadedOn desc
limit 10
It seems to work perfectly. Thanks again!
Tim Gustafson
MEI Technology Consulting, Inc
[EMAIL PROTECTED]
(516) 379-0001 Office
(516) 480-1870 Mobile/Emergencies
(516) 908-4185 Fax
http://www.meitech.com/
-----Original Message-----
From: Michael Stassen [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 26, 2005 12:08 AM
To: Frederic Wenzel
Cc: Gustafson, Tim; [email protected]
Subject: Re: Problems with "select distinct"
Frederic Wenzel wrote:
> On Tue, 25 Jan 2005 08:44:45 -0500, Michael Stassen
> <[EMAIL PROTECTED]> wrote:
>
>>I suspect mysql is doing the DISTINCT before the ORDER BY. One of the
>>reasons I avoid DISTINCT when possible.
>
> Isn't this supposed to be correct? Ordering has to take place as the
> very last operation, after any selection and projection, doesn't it?
>
> Regards
> Fred
In theory, or in practice? In practice, ordering is sometimes done
ahead of
time using an index, and DISTINCT may be optimized as a GROUP BY. In
theory, ORDER BY sorts the rows and DISTINCT throws out duplicates. So
long
as we sort by columns included in the SELECT output, it won't make any
difference which you do first and which last. The problem comes when we
sort on columns not in the selected output. Now that I've thought about
it
some more, I don't believe order of operations matters at all in that
case,
because which duplicate rows are thrown away by DISTINCT is not
defined.
You see? Suppose, prior to DISTINCT or ORDER BY, you have these rows:
ID Name
+--+------+
10 Test 1
10 Test 1
11 Test 2
10 Test 1
11 Test 2
Which two rows should DISTINCT keep? Even if you sort first, DISTINCT
is
under no obligation to choose the first rows it finds.
That DISTINCT may be optimized as a GROUP BY is instructive. MySQL lets
you
do things like
SELECT id, name, updated FROM mytable GROUP BY id, name;
but the 'updated' column is chosen more at less randomly for each group.
That is, for each group, you simply get one value of updated from an
undetermined row which belongs to the group. I think that's what's
happening here. The lesson is that you cannot use DISTINCT and then
ORDER
BY an unselected column and get meaningful results.
I'm not a big fan of DISTINCT. We get a lot of questions on the list
that
amount to "Here's my query, but it gives me more rows than I want. I
tried
to fix it by adding DISTINCT, but now I don't get the right result."
More
often than not, there is a better query which explicitly retrieves
precisely
the desired rows, with no need for DISTINCT.
Michael
smime.p7s
Description: S/MIME cryptographic signature
