Hi, all. Thanks developers for working on such an ambitious project!
In testing htdig, 3.2.0b2, with just one html file, the AND operator is
working like OR, as far as I can tell. Whether I select "method=all" or
"method=boolean" with ands in the query string, a query like "web fluble"
incorrectly returns the document (which contains "web" but not "fluble"). I
compiled 3.1.5 to see if I was doing anything really stupid, but with the
same document and an essentially identical config file, 3.1.5 returns the
correct results. (However, I want to use phrase matching, so 3.1.5 isn't a
permanent solution for me.)
I've already changed permissions on the _weakcmpr database as before, and
simple searches work as expected ("web design" matches the document, "design
web" doesn't, "web" matches, "fluble" doesn't).
Has anyone bumped into this before? I checked thru the archives of this
list and the Changelog from April 12 to May 30, and didn't find anything
similar. My htdig.conf follows; the sample search page is at
<http://www.aptima.com/~cta/search-3.2.html> (although command line searches
return the same results); the one document indexed is index.html.
Also, I noticed that the attribute list in htdoc lists "version" (that an
attribute first appeared), while www.htdig.org doesn't. Is there a reason
for this?
Thanks for any help with this...
--
Arthur Prokosch, <[EMAIL PROTECTED]>
Usability/Web Intern
Aptima, Inc. <http://www.aptima.com/>
781-935-3966 x26
-- begin htdig.conf (most comments stripped) --
start_url: http://www.aptima.com/~cta/
# use file access for all URLs indexed
#
local_urls: http://www.aptima.com/~cta/=/home/cta/public_html/
# don't fall back to HTTP, as www.aptima.com is unreachable from here
#
local_urls_only: true
limit_urls_to: ${start_url}
exclude_urls: /cgi-bin/ search.html
bad_extensions: .cgi .wav .gz .z .sit .au .zip .tar .hqx .exe .com \
.gif .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi
maintainer: [EMAIL PROTECTED]
#max_head_length: 10000
max_doc_size: 200000
no_excerpt_show_top: true
#search_algorithm: exact:1 synonyms:0.5 endings:0.1
search_algorithm: exact:1
# disable backlink weighting (which is on by default?)
#
backlink_factor: 0
# we could use synonyms (misspellings, really) when we start enabling
# text-box searches?
template_map: Long long ${common_dir}/long.html \
Short short ${common_dir}/short.html \
Custom custom ${common_dir}/custom.html
template_name: custom
next_page_text: '[ Next > ]'
no_next_page_text:
prev_page_text: '[ < Prev ]'
no_prev_page_text:
page_number_text: 1 2 3 4 5 6 7 8 9 10
no_page_number_text: >1< >2< >3< >4< >5< \
>6< >7< >8< >9< >10<
# local variables:
# mode: text
# eval: (if (eq window-system 'x) (progn (setq font-lock-keywords (list
'("^#.*" . font-lock-keyword-face) '("^[a-zA-Z][^ :]+" .
font-lock-function-name-face) '("[+$]*:" . font-lock-comment-face) ))
(font-lock-mode)))
# end:
-- end htdig.conf ---
-- begin redirected output from rundig -vvvvvv --
ht://dig Start Time: Wed Aug 2 11:32:38 2000
1:0:http://www.aptima.com/~cta/
New server: www.aptima.com, 80
- Persistent connections: enabled
- HEAD before GET: disabled
- Timeout: 30
- Connection space: 0
- Max Documents: -1
- TCP retries: 1
- TCP wait time: 5
Trying to retrieve robots.txt file
pushed
pick: www.aptima.com, # servers = 1
> www.aptima.com supports HTTP persistent connections (infinite)
0:2:0:http://www.aptima.com/~cta/: Trying local files
found existing file /home/cta/public_html/index.html
Read 43 from document
Read a total of 43 bytes
Tag: blink, matched -1
word: hi.@1
word: this@2
word: bad@3
word: web@4
Tag: /blink, matched -1
word: design.@5
head: hi. this is bad web design.
size = 43
pick: www.aptima.com, # servers = 1
> www.aptima.com supports HTTP persistent connections (infinite)
ht://dig End Time: Wed Aug 2 11:32:38 2000
ID: 2 URL: http://www.aptima.com/~cta/
-- end redirect --
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.