On Thu, 26 Apr 2001, Tony Crockford wrote:
> Hope this helps.
>
> In other words, you can't use the search form hidden keywords option for
> anything other than plain words.

Yes, that definitely explains the behavior.  Thanks!

> you could of course use:
>
> > <option value="ir-cg">ir cg
> > <option value="ir-cg-a">ir cg a
> > <option value="ir-cg-b">ir cg b
> > <option value="ir-cg-b-x">ir cg b x
> > <option value="ir-cg-b-y">ir cg b y
>
> but I suspect it will fail on the single character options.

Actually, I think maybe that should be the other way around:

<option value="ir cg b x">ir-cg-b-x

And you're right... it would fail because of my too-short category
abbreviations.  To try this out, I set minimum_word_length to 1, and
rebuilt the index.  Now tags like the one above (that is, multiple
values separated by space or Ctrl-A) do work, assuming "work" means
"match documents containing all these keywords".  But they result in
what would be false matches, given my category semantics.

That is, if I had the following categories:
a
a-b
a-b-c
a-b-d
a-c
a-c-d

then the category tree might be represented as:

     a
    / \
   b    c
 /  \    \
c    d    d

(if you're viewing this in a fixed-width font!)

Now, if a document is tagged with "a-b-c", then a search that included
a "keywords" specification of "a c" would match the document, since all
the keyword terms were "found" in the compound word contained in the
document's META tag.  But according to my category scheme, the document
should NOT match, because it's in a totally separate branch of the tree.
That is, the "c" under "a-b" is a different category from the "c" which
is directly under "a".

I suppose I could hack around this limitation by:
(1) making my category abbreviations at least 3 characters  :-)
(2) prefixing each category abbreviation with its level in the tree
(3) take advantage of the fact that when htdig builds the index, it
strips out the dash, yet includes all the subword combinations

So I might now have:

1aaa
1aaa-2bbb
1aaa-2bbb-3ccc
1aaa-2bbb-3ddd
1aaa-2ccc
1aaa-2ccc-3ddd

Now, a keyword search for "1aaa2ccc" (with no space) would locate only
those documents whose level 1 category is "1aaa", and whose level 2
category is "2ccc".  It would not return the document tagged with
"1aaa-2bbb-3ccc".  However, it *would* return a document tagged with
"1aaa-2ccc-3ddd", which is exactly what I'm trying to accomplish.

My concern with this kind of trickery is that the indexing or searching
behavior might change in a future version of ht://Dig in such a way as
to break my scheme.

---
Patrick Robinson
AHNR Info Technology, Virginia Tech
[EMAIL PROTECTED]

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to