On Jun 20, 2005, at 10:54 AM, [EMAIL PROTECTED] wrote:
Monday, June 20, 2005, 5:48:30 PM, Erik Hatcher wrote:

Now you've just said the same conflicting thing a different way.  You
want to cluster but only return one.  :)


i think i missunderstood here the Term: cluster.
so yes, i just want one image returned.

Maybe my interpretation of "cluster" is clouded by the search domain. In the search domain, cluster means grouping multiple things.

If you only want one image returned, then it seems that only indexing
the same image once is the way to go.  When you find a duplicate MD5,
don't index that as a second document.  You will, instead, update the
document by adding additional ALT text and perhaps the additional URL.


this sounds pretty ok !

The tricks are to do a search when indexing to find duplicates, and to "update" the document by deleting and re-adding it (you'll probably want to store the field data so you can retrieve it easily and use it for the new updated document.

The negative to this approach is you want know specifically which page the image was on in results, though you could keep all URL's that point to it as a document can have multiple fields named "URL" for example.

in sql this would be:
select distinct md5, url, alt from table group by md5 order by
score asc;



This would give you multiple records for the same MD5.  You said
above you only want one per MD5.


here i'm afraid you are not correct, because i have GROUP BY MD5
clause which will return no duplicates.

Sorry, I missed the GROUP BY clause there in my first human parse of the expression - I was too busy focusing on DISTINCT.

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to