In article <bdf1473550.bo...@boase.demon.co.uk>, Bernard Boase <b.bo...@bcs.org> wrote: > Just looked at the site www.world-science.net
> Netsurf renders much of its text with inter-syllable sequences Â > which, in the original HTML, are all hex C2 AD. This is utf-8 for "soft hyphen". Netsurf isn't handling this encoding it seems - which is intended to give a hint to a browser as to how a word could be split across a line boundary as in printing hyphenation. If there is no need to break across a line boundary then the hyphen should be silently ignored - as does Firefox. > Is this legitimate HTML perhaps for automatic hyphenation or > something? Should Netsurf edit it out? Firefox does. > Whilst HTML entity 슭 seems to be valid, > http://www.fileformat.info/info/unicode/char/c2ad/index.htm > tell us that U+C2AD is not a valid unicode character. I'm sorry to say that all of the different 'encodings' on that web document are generated on the fly as the document is being served - auto-magically - but blindly. If the code is not valid as a Unicode then that is it - allbets are off! The utf-8 is the correct encoding for the Unicode code point U+00AD - try looking at http://www.fileformat.info/info/unicode/char/00ad/index.htm Keith -- Inspired!