I've noticed the
following which may, or may not be "real" bugs when testing the new Beta
version:
1] The htdig.conf
file seems to be very, very sensitive to whitespace at the end of lines. In particular, with a multiline attribute as illustrated just below, if there is
white space (tested with [tab]s) after the \ character,
htdig _and_ htsearch will fail:
server_aliases:
www.cbfm.rbs.co.uk=www.cbfm.rbsgrp.net
\
www.cib.rbs.co.uk=www.cib.rbsgrp.net
www.cib.rbs.co.uk=www.cib.rbsgrp.net
2] I can't seem to
get any sensible changes to results with htsearch using
url_seed_score
url_seed_score:
cbfm|fmintranet|cib. *500,+1000
\
manufacturing.|retail|technology.|wealthmanagement.|rbs.|group *.1,
manufacturing.|retail|technology.|wealthmanagement.|rbs.|group *.1,
Even stupidly high
factors don't seem to have an effect (like 100,000). (tried with and without commas and spaces separating values)
3] If there is _not_
a return after the last line in the config file then htsearch causes a cgi error. Results from apache eror log:
Unknown char in line
224: #[Fri Nov 14 23:51:46 2003] [error] [client 147.114.74.200] malformed header from script. Bad header=syntax error: /var/www/cgi-bin/htsearch32
4] If you search for
a phrase and it forms part of a longer string then the results are not
highlighted in the extract displayed. This is most apparent when the second word is singular, but it finds a plural result.
Search for "animal
feedstuff"
finds "animal
feedstuff"s --- no highlight
finds "animal
feedstuff" --- highlight as expected
Hope this makes
sense!
Lastly, are the
cookies.txt mechanism and check_unique_md5 actually known to
work?
Running 3.2.0b5 on:
Linux lon3561xus 2.4.9-31smp #1 SMP Tue Feb 26 06:55:00 EST 2002 i686 unknown
It has happily
indexed multi server intranet with about <50k pages, including parseing PDFs and
Word docs - but, as ever, seems limited by my
web server responses/network latentcy, so this took over 18
hours. I'm really very happy with what I've seen so far - especially the
phrase search which is crucial for me to keep this product in
place.
Best
regards
Nicholas Booth
Royal Bank of
Scotland, Corporate Banking
280
Bishopsgate
London
***********************************************************************************
This e-mail is intended only for the addressee named above.
As this e-mail may contain confidential or privileged information,
if you are not the named addressee, you are not authorised to
retain, read, copy or disseminate this message or any part of it.
The Royal Bank of Scotland plc is registered in Scotland No 90312
Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB
Regulated by the Financial Services Authority
Visit our website at http://www.rbs.co.uk/CBFM/
***********************************************************************************