In article <bdf1473550.bo...@boase.demon.co.uk>,
   Bernard Boase <b.bo...@bcs.org> wrote:
> Just looked at the site www.world-science.net

> Netsurf renders much of its text with inter-syllable sequences ­
> which, in the original HTML, are all hex C2 AD.

     This is utf-8 for "soft hyphen". Netsurf isn't handling this encoding
it seems - which is intended to give a hint to a browser as to how a word
could be split across a line boundary as in printing hyphenation. If there
is no need to break across a line boundary then the hyphen should be
silently ignored - as does Firefox.

> Is this legitimate HTML perhaps for automatic hyphenation or
> something? Should Netsurf edit it out? Firefox does.

> Whilst HTML entity &#xC2AD; seems to be valid,
> http://www.fileformat.info/info/unicode/char/c2ad/index.htm
> tell us that U+C2AD is not a valid unicode character.

     I'm sorry to say that all of the different 'encodings' on that web
document are generated on the fly as the document is being served -
auto-magically - but blindly. If the code is not valid as a Unicode then
that is it - allbets are off!  The utf-8 is the correct encoding for the
Unicode code point U+00AD - try looking at

http://www.fileformat.info/info/unicode/char/00ad/index.htm

                         Keith

-- 
Inspired!

Reply via email to