According to Adam Brown:
> On Friday 14 February 2003 11:14, Adam Brown wrote:
...
> > I am indexing this page using Htdig 3.1.6:
> > http://wire.org.au/information/violence/domestic/womens_stories/one_rural_w
> >omans_story.html The page contains the words "woman's" and "womans" but not
> > "woman'.
> >
> > The search page is located at: http://wire.org.au/public_search.html
> >
> > When I search for "rural woman's" or "rural womans" I get no hits. However
> > when I search for "woman" the page is returned.
> >
> > My understanding is that using the default Htdig settings that "woman's"
> > gets indexed as "womans". So surely a search for 'womans' should be
> > successful.

That's correct, assuming you're using the defaultS.  The word "woman's"
should get indexed as both "womans" and "woman".  If you had changed
valid_punctuation at the time you indexed, taking out the apostrophe,
then this would not work, and "woman's" would be indexed only as "woman".
You should check your db.wordlist file to make sure that both woman and
womans appear in there.

> Researching further:
> 
> Results from htdig -vvvv indicate that the word "woman" is indexed, not 
> "womans"

If you are indeed running htdig version 3.1.6, then the -vvvv output should
show, when the word "woman's" is parsed, the following lines:

word: woman's@(location)
word part: woman@(location)

Both of these should go into db.wordlist, with the apostrophe being stripped
from the first one.

> A search for "women's" (note the e) returns a hit. I looked in the ispell 
> dictionary file english.0 and the listings for the two words are:
> woman/MY
> women/MS

When htfuzzy builds the endings database, it strips out rules that contain
apostrophes, so the M suffix is ignored.  So, a search for "woman" will
match "(woman or womanly)", and a search for "women" will match
"(women or womens)".  I'm not actually sure why there's an S suffix on
women, but that's another matter.  If you want a search for woman or
women to match any of women's, woman's, woman or women, you could add a
line like the following in your synonyms file and do an "htfuzzy synonyms":

woman womans women womens

However, searching for "woman's" in htsearch should match "womans" in
the database, so if it did get indexed as such in the database, I'm at
a loss to explain why htsearch isn't picking it up.

> Is it the case that Htdig reduces the search word "woman's" to "womans" which 
> doesn't register a hit because "woman" is recorded in the database and 
> "womans" is not a valid extension of "woman"?

No, the endings database is only used for fuzzy matching, to augment
the possible matches.  It's not used to invalidate any exact matches.
However, the endings and synonyms fuzzy matches will only occur if these
are specified in your search_algorithm attribute.  You can always run
"htsearch -vv words=womans" from the command line to see what it does.

> I use the setting:
> valid_punctuation: .-_/!#$%^&'()

There should be a backslash '\' in front of the dollar sign '$', otherwise
it gets swallowed up in the variable substitution phase.  That shouldn't
cause the apostrophe to be swallowed up, though, at least as far as I can
tell.

Without knowing what's in your db.wordlist, I'm at a bit of a loss as to
what to suggest next.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your  SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to