According to Joe R. Jah: > On Mon, 24 Sep 2001, Evaldas wrote: > > I have a perl script that uses 'path_info' to display the appropriate > > record from a database. The htdig works fine indexing the pages, so no > > problem with that. > > > > The problem is that there can be different URLs which actually display > > the same page, f.e.: > > http://domain.com/one/two/A > > http://domani.com/one/three/A > > htpp://domani.com/one/four/A > > > > are all displaying the same record from a database. htsearch displays > > them as a separate found pages. What I would like is to eliminate all > > repeating pages and display only one of them. > > > > Is this possible with htdig?
Perhaps, but for pages generated by a Perl CGI script, it may prove tricky to automatically generate exclude patterns. See http://www.htdig.org/FAQ.html#q4.24 > That depends on what version of htdig at what patch level you use. If you > use the unpatched 3.1.5, you can either apply the following patch and read > its documentation carefully: > > ftp://ftp.ccsf.org/htdig-patches/3.1.5/htdig-3.1.5.aarmstrong.README > ftp://ftp.ccsf.org/htdig-patches/3.1.5/htdig-3.1.5.aarmstrong.tar.gz > > And add lines like the following to your htdig configuration file and > re-index: > -------------------------------8<--------------------------------- > url_rewrite_rules: \ > http://domain.com/one/two/(.*) http://domain.com/one/one/\\1 \ > http://domain.com/one/three/(.*) http://domain.com/one/one/\\1 \ > http://domain.com/one/four/(.*) http://domain.com/one/one/\\1 > -------------------------------8<--------------------------------- > > You can also wait for 3.1.6, soon to be released, which will allow URL > rewrite; use similar lines in your htdig configuration file and re-index. Most likely, the url_rewrite_rules support will be in this coming Sunday's snapshot of 3.1.6. I've tested Geoff's adaptation of Andy's patch, and it seems to work fine. Again, though, this technique may be a problem if there's a lot of variability in the possible paths for duplicates, so it may not be any easier to use this than exclude_urls. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

