Hello,
I want htdig to exclude URLs that contain the ? question mark query
separator. I have the following configuration file but URLs like that
are still being indexed. I am using htdig 3.1.4 . Is this a bug?
I know I can exclude URLs like that in htsearch by setting the exclude
query string argument, but I also noticed that if I have it set to
"? /graphics/" the exclusing no longer works.
Anybody knows what is the problem?
The command line called by PHP like this:
REQUEST_METHOD=GET
QUERY_STRING="words=forms&format=htdig&exclude=%3F+%2Fgraphics%2F&matchesperpage=10&method=or&page=1&sort=score"
/usr/local/htdocs/htdig/cgi-bin/htsearch -c setup/htdig.conf
The configuration is this:
database_dir: /usr/local/htdig/db/test
start_url: http://local.test.org/test/
maintainer: [EMAIL PROTECTED]
search_algorithm: exact:1 synonyms:0.5 endings:0.1
exclude_urls: ?
limit_urls_to: http://local.test./test/
bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif .jpg .jpeg .aiff
.class .map .ram .tgz .bin .rpm .mpg .mov .avi
max_head_length: 10000
max_doc_size: 200000
no_excerpt_show_top: true
valid_punctuation: : .-_/!#$%^&*��
template_map: htdig htdig library/htdig_template.html
search_results_header: library/htdig_header.html
search_results_footer:
nothing_found_file: library/htdig_nomatch.html
syntax_error_file: library/htdig_syntaxerror.html
Regards,
Manuel Lemos
Web Programming Components using PHP Classes.
Look at: http://phpclasses.UpperDesign.com/?[EMAIL PROTECTED]
--
E-mail: [EMAIL PROTECTED]
URL: http://www.mlemos.e-na.net/
PGP key: http://www.mlemos.e-na.net/ManuelLemos.pgp
--
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.