Gilles Detillieux writes:
> According to Patrick Robinson:
> > On Tue, 24 Apr 2001, Geoff Hutchison wrote:
> > > There's also a "keywords" field for the htsearch form for exactly this
> > > purpose. You can thus make:
> > >
> > > <input type="hidden" name="keywords" content="foo-bar">
> > 
> > But would that also pick up documents that had "keywords" tags of
> > "foo-bar-baz" and "foo-bar-baz-boom"?  In other words, I want to
> > restrict the search result to those documents occurring in the current
> > category OR any of its subcategories.
> 
> Yes!  That's one of the benefits of the compound word handling
> enhancements I put in back in version 3.1.3.  If it picks up a meta
> keyword of foo-bar-baz-boom, it goes in the index as foo-bar-baz-boom,
> foo-bar-baz, foo-bar, foo, bar-baz-boom, bar-baz, bar, baz-boom, baz &
> boom, so if you search for any of these words it will match the document.
> Ideally, you'd have a set of codes that don't match any actual word in
> any of your documents, to avoid false matches.  Codes made up of long
> unpronounceable sequences of consonants would work best.  If necessary,
> you could just prepend some sequence of letters to your existing codes
> to get around the problem of false matches.

Ooo... this is nice!

I put together a little example, just to see how it would work.  I did
eventually get it working, but not quite in the way I was expecting.
My problem was that htdig seems to be stripping the "-" (dash) character
out of my keywords.  If my meta tag looks like this:

<meta name="htdig-keywords" content="foo-bar-baz">

then htsearch cannot find the document if I specify "foo-bar-baz"
in the "keywords" form field.  But, if I specify it as "foobarbaz"
(without the dashes), then htsearch finds it.  And, looking in the
.wordlist.work file, I see all my category keywords, but with the
dashes stripped out.

Looking at the config file docs, I see that I could prevent this
by removing '-' from valid_punctuation, and (presumably) adding
it to extra_word_characters (although I'm not sure I completely
understand how those two directives interact; are they both needed?
Would it make sense to include a character in extra_word_characters
without also removing it from valid_punctuation?).

But this would have undesirable side-effects (losing the default
behavior for normal words).  What else can I do to get the compound
word handling described above, but also maintain a category separator
character?

(Actually, it doesn't seem quite right to me that htsearch can't
find "foo-bar-baz", since, acc. to the doc for valid_punctuation,
"the same transformation is performed on the keywords the search
engine gets".  Maybe it doesn't apply to "keywords", but just to
"words"?)

Still kinda stuck,

-- 
Patrick Robinson
AHNR Info Technology, Virginia Tech
[EMAIL PROTECTED]

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to