According to Joe R. Jah:
> On Mon, 24 Sep 2001, Evaldas wrote:
> > I have a perl script that uses 'path_info' to display the appropriate
> > record from a database. The htdig works fine indexing the pages, so no
> > problem with that.
> > 
> > The problem is that there can be different URLs which actually display
> > the same page, f.e.:
> > http://domain.com/one/two/A
> > http://domani.com/one/three/A
> > htpp://domani.com/one/four/A
> > 
> > are all displaying the same record from a database. htsearch displays
> > them as a separate found pages. What I would like is to eliminate all
> > repeating pages and display only one of them.
> > 
> > Is this possible with htdig?

Perhaps, but for pages generated by a Perl CGI script, it may prove tricky
to automatically generate exclude patterns.

See http://www.htdig.org/FAQ.html#q4.24

> That depends on what version of htdig at what patch level you use.  If you
> use the unpatched 3.1.5, you can either apply the following patch and read
> its documentation carefully:
> 
>  ftp://ftp.ccsf.org/htdig-patches/3.1.5/htdig-3.1.5.aarmstrong.README
>  ftp://ftp.ccsf.org/htdig-patches/3.1.5/htdig-3.1.5.aarmstrong.tar.gz
> 
> And add lines like the following to your htdig configuration file and
> re-index:
> -------------------------------8<---------------------------------
>  url_rewrite_rules:       \
>  http://domain.com/one/two/(.*)  http://domain.com/one/one/\\1 \
>  http://domain.com/one/three/(.*)  http://domain.com/one/one/\\1 \
>  http://domain.com/one/four/(.*)  http://domain.com/one/one/\\1 
> -------------------------------8<---------------------------------
> 
> You can also wait for 3.1.6, soon to be released, which will allow URL
> rewrite; use similar lines in your htdig configuration file and re-index.

Most likely, the url_rewrite_rules support will be in this coming Sunday's
snapshot of 3.1.6.  I've tested Geoff's adaptation of Andy's patch, and
it seems to work fine.

Again, though, this technique may be a problem if there's a lot of
variability in the possible paths for duplicates, so it may not be any
easier to use this than exclude_urls.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to