Dennis,
If I remember correctly, search_rewrite_rules applies only to search
strings, whereas url_rewrite_rules applies during the indexing process.
The latter therefore is a better bet, since it should remove duplicates
_in the database_ whereas the former cannot.
However, I haven't used either feature, so cannot advise on how to use
them.

Mike

> -----Original Message-----
> From: Dennis Watson [mailto:[EMAIL PROTECTED] 
> Sent: 05 July 2005 22:42
> To: Brockington,MJ,Mike,IQ D; Dennis Watson
> Subject: RE: [htdig] Eliminating Duplicate Search Results
> 
> 
> Hello Michael,
> 
> Thanks for your reply.  Unfortunately, neither of these 
> directives is going
> to help.  The big issue is that we are using Vignette as our content
> management system, and Vignette:
> 
>    1) Has horrible looking URLs like
> http://www.military.com/NewsContent/0,13319,FL_bush_070505,00.html
>    2) Each directory has its own default_doc: 
> http://www.military.com is
> same as http://www.military.com/Page/0,12170,1-OO-0,00.htm and
>  
> http://www.military.com/Finance/Home/1,13397,,00.html is same as
> http://www.military.com/Finance/Home/
>    3) The aliases define human readable URLs for Vignette 
> URLs on the same
> server.  Like these 
> http://httpd.apache.org/docs/mod/mod_alias.html but
> defined inside the Vignette black box.
> 
> What I am looking for is some way to make HTDig perform a 
> regex on the URL
> and throw away URLs that are the same after the transformation.  
> 
> 
> Dennis Watson [EMAIL PROTECTED]
> UNIX System Administrator Military.com
> Email [EMAIL PROTECTED] for site issues
> 
> 
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Sent: Sunday, July 03, 2005 4:12 AM
> To: [EMAIL PROTECTED]
> Subject: RE: [htdig] Eliminating Duplicate Search Results
> 
> 
> I would have thought that the example that you give below should have
> been handled by the http://www.htdig.org/attrs.html#remove_default_doc
> setting. Have you looked into that?
> 
> As for the other part, if you know what the aliases are on the server
> (can you copy them from a config file?) then you can probably use the
> http://www.htdig.org/attrs.html#server_aliases  setting.
> 
> Mike
> 
> 
> 
> > -----Original Message-----
> > From: [EMAIL PROTECTED] 
> > [mailto:[EMAIL PROTECTED] On Behalf 
> > Of Dennis Watson
> > Sent: 28 June 2005 22:58
> > To: '[email protected]'
> > Subject: [htdig] Eliminating Duplicate Search Results
> > 
> > 
> > Hello All,
> > 
> > I am using HTDig 3.1.6 on a large web site that has many 
> > aliases for pages,
> > so different URLs point to the same content.  This is causing 
> > duplicate
> > search results since HTDig is using the URL as the unique id. 
> >  People are
> > also not consistent with how they write URLs so
> > http://www.military.com/spouse and 
> > http://www.military.com/spouse/ (note
> > trailing slash) and 
> > these are coming up as different results as well.
> > 
> > I have tried a few different things like search_rewrite_rules (
> > search_rewrite_rules: http://(.*)/$   http://\\1 ), but the 
> > regex was too
> > greedy and htsearch displayed duplicate results anyway.  My 
> > next guess is
> > url_rewrite_rules, but I am unsure how to write the regexes 
> > and if htsearch
> > will dedupe results with the same URL after rewriting.
> > 
> > How can I get htsearch to rewrite these URLs and dedupe the 
> > ones that end up
> > being the same?  Some of the URLs are very ugly and would 
> > require complex
> > regexes.  If I cannot do it within the HTDIG framework, I may 
> > have to htdump
> > indexes created by htdig, post processing the dumpfiles with 
> > a perl script
> > that munges the URLs as needed and then load and merge the 
> > new indexes.  If
> > that is not possible I may have to munge the search results 
> > on the fly and
> > not display the dupes (ugh!)
> > 
> > 
> > Dennis Watson [EMAIL PROTECTED]
> > UNIX System Administrator Military.com
> > 
> > 
> > 
> > -------------------------------------------------------
> > SF.Net email is sponsored by: Discover Easy Linux Migration 
> Strategies
> > from IBM. Find simple to follow Roadmaps, straightforward articles,
> > informative Webcasts and more! Get everything you need to get up to
> > speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
> > _______________________________________________
> > ht://Dig general mailing list: <[email protected]>
> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> > List information (subscribe/unsubscribe, etc.)
> > https://lists.sourceforge.net/lists/listinfo/htdig-general
> > 
> 


-------------------------------------------------------
This SF.Net email is sponsored by the 'Do More With Dual!' webinar happening
July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual
core and dual graphics technology at this free one hour event hosted by HP,
AMD, and NVIDIA.  To register visit http://www.hp.com/go/dualwebinar
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to