Thanks for the clarifications, Gilles.  It's really great having 
someone with your intimate knowledge with ht://Dig still around!

On a related note, I notice that the ^Z after the </HTML> in 
test/htdocs/set1/site2.html  makes it through into the results.  
Should we check that only printable characters get passed?

Cheers,
Lachlan

On Sun, 18 Jan 2004 15:50, Gilles Detillieux wrote:
> Things could break if
> htdig/htsearch started second-guessing the encoding of URLs in
> pages it indexes and doubly encoded them.
>
> This string contains not only stuff from
> the original web page, which htdig has already SGML-decoded, but it
> also contains some HTML tags that htsearch inserts
>
> By the way, punctuation is not stripped from EXCERPT -- only the
> original HTML tags from the source page are.

-- 
[EMAIL PROTECTED]
ht://Dig developer DownUnder  (http://www.htdig.org)


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to