According to Greg Lepore: > It appears that HTDIG 3.1.6 handles superscripts by inserting a space for > the <SUP> opening tag, which causes the search to not find words which > contain the superscript. Example: Cap<SUP>t</SUP> will return as "Cap t" > so a search for capt will not work. Is HTDIG properly handling the <SUP> > tag? An example is at: > >http://www.mdarchives.state.md.us/megafile/msa/speccol/sc2900/sc2908/000001/000011/html/am11--554.html
When I added support for handling <sup> and <sub> tags, I had to decide whether they should cause a word break or not. The problem before was that they caused a word break in the words going into the word database, but not the excerpt, so matched words weren't highlighted in the excerpts if they were juxtaposed to a superscript or subscript. I set out to make it consistent, but had to decide which of the two behaviours to choose. In most of the uses of superscripts and subscripts I've seen, it makes more sense to treat them as causing a word break, so that's what I decided to do. The URL above is a good example of both uses of superscripts. When htdig sees Mich<sup>1</sup> or Nath<sup>1</sup>, it makes sense to have a word break at the <sup> tag, so that you can search for mich or nath instead of mich1 or nath1. However, for Capt<sup>n</sup>, you'd want a search for captn to find it. It occurs to me that this is a lot like the dilemma of valid_punctuation, where I fixed the code to take a word like post-doctoral and index it as post, doctoral and postdoctoral. Maybe <sup> and <sub> should be treated as a hyphen rather than a word break. That would be good for uses like 2<sup>nd</sup> too. Can anyone think of counterarguments to this? -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

