In article <>,
   Bernard Boase <> wrote:
> Just looked at the site

> Netsurf renders much of its text with inter-syllable sequences ­
> which, in the original HTML, are all hex C2 AD.

     This is utf-8 for "soft hyphen". Netsurf isn't handling this encoding
it seems - which is intended to give a hint to a browser as to how a word
could be split across a line boundary as in printing hyphenation. If there
is no need to break across a line boundary then the hyphen should be
silently ignored - as does Firefox.

> Is this legitimate HTML perhaps for automatic hyphenation or
> something? Should Netsurf edit it out? Firefox does.

> Whilst HTML entity &#xC2AD; seems to be valid,
> tell us that U+C2AD is not a valid unicode character.

     I'm sorry to say that all of the different 'encodings' on that web
document are generated on the fly as the document is being served -
auto-magically - but blindly. If the code is not valid as a Unicode then
that is it - allbets are off!  The utf-8 is the correct encoding for the
Unicode code point U+00AD - try looking at



Reply via email to