Re: Re[4]: md5 keyword field issue

Erik Hatcher Mon, 20 Jun 2005 09:33:36 -0700


On Jun 20, 2005, at 10:54 AM, [EMAIL PROTECTED] wrote:

Monday, June 20, 2005, 5:48:30 PM, Erik Hatcher wrote:

Now you've just said the same conflicting thing a different way.  You
want to cluster but only return one.  :)


i think i missunderstood here the Term: cluster.
so yes, i just want one image returned.

Maybe my interpretation of "cluster" is clouded by the searchdomain. In the search domain, cluster means grouping multiple things.

If you only want one image returned, then it seems that only indexing
the same image once is the way to go.  When you find a duplicate MD5,
don't index that as a second document.  You will, instead, update the

document by adding additional ALT text and perhaps the additionalURL.


this sounds pretty ok !

The tricks are to do a search when indexing to find duplicates, andto "update" the document by deleting and re-adding it (you'llprobably want to store the field data so you can retrieve it easilyand use it for the new updated document.

The negative to this approach is you want know specifically whichpage the image was on in results, though you could keep all URL'sthat point to it as a document can have multiple fields named "URL"for example.

in sql this would be:
select distinct md5, url, alt from table group by md5 order by
score asc;

This would give you multiple records for the same MD5.  You said
above you only want one per MD5.


here i'm afraid you are not correct, because i have GROUP BY MD5
clause which will return no duplicates.

Sorry, I missed the GROUP BY clause there in my first human parse ofthe expression - I was too busy focusing on DISTINCT.


    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Re[4]: md5 keyword field issue

Reply via email to