Thanks for the clarifications, Gilles. It's really great having someone with your intimate knowledge with ht://Dig still around!
On a related note, I notice that the ^Z after the </HTML> in test/htdocs/set1/site2.html makes it through into the results. Should we check that only printable characters get passed? Cheers, Lachlan On Sun, 18 Jan 2004 15:50, Gilles Detillieux wrote: > Things could break if > htdig/htsearch started second-guessing the encoding of URLs in > pages it indexes and doubly encoded them. > > This string contains not only stuff from > the original web page, which htdig has already SGML-decoded, but it > also contains some HTML tags that htsearch inserts > > By the way, punctuation is not stripped from EXCERPT -- only the > original HTML tags from the source page are. -- [EMAIL PROTECTED] ht://Dig developer DownUnder (http://www.htdig.org) ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev