According to Greg Burnham: > As part of the crawl, we're indexing an affiliate site and need to > massage their urls for display in the search results. So I look through > the docs and find "search_rewrite_rules" and it sounds like it'll do the > trick. > > Now I'm no wiz at regular expressions, but here's what I want to do: > original url: > http://www.domain.ca/PrinterFriendly2.cfm?ArticleId=ZZZ > the url I want: > http://www.domain.ca/index.cfm?Param=YYY&ArticleId=ZZZ > > So I put this in the htdig.conf (all on one line): > > search_rewrite_rules: > http://www\\.domain\\.ca/PrinterFriendly2\\.cfm\\?ArticleId=(.*) > http://www\\.domain\\.ca/index\\.cfm\\?PgNm=TCE&ArticleId=\\1 > > And, of course, it doesn't work. > > So, more searching in the faq and maillist and I come across the entry > for url_part_aliases but it implies that I should use either > url_rewrite_rules or search_rewrite_rules. > > I guess what I'm asking is which method is best for rewriting, and > what's wrong with my regex? > > Thanks, > Greg
The only thing that jumps out at me about your regex is you shouldn't need to backslash-escape the "." and "?" in the right hand side, only the left, but that shouldn't prevent it from working. Are you sure you're using version 3.1.6? As for which method is best for rewriting, that all depends on when you want the rewriting done. Both work the same way, but one rewrites before the URLs are fetched and indexed, while the other works after. So, the question is really which URLs are best for indexing? That's for you to figure out. Note also that if you end up indexing both types of URLs, but then rewrite one type to the other at search time, you may end up with duplicate search results. Rewriting at indexing time should avoid this, but you could also avoid duplicates with a well thought-out exclude_urls attribute. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

