On Jun 20, 2005, at 10:54 AM, [EMAIL PROTECTED] wrote:
Monday, June 20, 2005, 5:48:30 PM, Erik Hatcher wrote:
Now you've just said the same conflicting thing a different way. You
want to cluster but only return one. :)
i think i missunderstood here the Term: cluster.
so yes, i just want one image returned.
Maybe my interpretation of "cluster" is clouded by the search
domain. In the search domain, cluster means grouping multiple things.
If you only want one image returned, then it seems that only indexing
the same image once is the way to go. When you find a duplicate MD5,
don't index that as a second document. You will, instead, update the
document by adding additional ALT text and perhaps the additional
URL.
this sounds pretty ok !
The tricks are to do a search when indexing to find duplicates, and
to "update" the document by deleting and re-adding it (you'll
probably want to store the field data so you can retrieve it easily
and use it for the new updated document.
The negative to this approach is you want know specifically which
page the image was on in results, though you could keep all URL's
that point to it as a document can have multiple fields named "URL"
for example.
in sql this would be:
select distinct md5, url, alt from table group by md5 order by
score asc;
This would give you multiple records for the same MD5. You said
above you only want one per MD5.
here i'm afraid you are not correct, because i have GROUP BY MD5
clause which will return no duplicates.
Sorry, I missed the GROUP BY clause there in my first human parse of
the expression - I was too busy focusing on DISTINCT.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]