According to Greg Lepore:
> It appears that HTDIG 3.1.6 handles superscripts by inserting a space for 
> the <SUP> opening tag, which causes the search to not find words which 
> contain the superscript.  Example: Cap<SUP>t</SUP> will return as "Cap t" 
> so a search for capt will not work. Is HTDIG properly handling the <SUP> 
> tag? An example is at:
> 
>http://www.mdarchives.state.md.us/megafile/msa/speccol/sc2900/sc2908/000001/000011/html/am11--554.html

When I added support for handling <sup> and <sub> tags, I had to decide
whether they should cause a word break or not.  The problem before was
that they caused a word break in the words going into the word database,
but not the excerpt, so matched words weren't highlighted in the excerpts
if they were juxtaposed to a superscript or subscript.  I set out to make
it consistent, but had to decide which of the two behaviours to choose.

In most of the uses of superscripts and subscripts I've seen, it makes
more sense to treat them as causing a word break, so that's what I decided
to do.  The URL above is a good example of both uses of superscripts.
When htdig sees Mich<sup>1</sup> or Nath<sup>1</sup>, it makes sense to
have a word break at the <sup> tag, so that you can search for mich or
nath instead of mich1 or nath1.  However, for Capt<sup>n</sup>, you'd
want a search for captn to find it.  It occurs to me that this is a lot
like the dilemma of valid_punctuation, where I fixed the code to take a
word like post-doctoral and index it as post, doctoral and postdoctoral.
Maybe <sup> and <sub> should be treated as a hyphen rather than a
word break.  That would be good for uses like 2<sup>nd</sup> too.
Can anyone think of counterarguments to this?

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to