On Thu, Mar 20, 2014 at 6:04 AM, Behdad Esfahbod <[email protected]> wrote:

>
> Also, Unicode says GC=Cc should just render as boxed if not supported.


However, it also says that  characters with the White_Space property true
it should be rendered as space.  In addition to 0x9, 0xA and 0xD (which
both CSS and HTML treat as white space), these are 0xB (VT), 0xC (FF), and
0x85 (NEL).

The
> reason we want them removed here is really an artifact of the HTML spec.


The requirement of ignoring all GC=Cc characters seems to be an artifact of
the CSS3 Text WD (http://www.w3.org/TR/css-text-3/#white-space-processing),
which is not yet set in stone.  Note that it's different from CSS2.1 (
http://www.w3.org/TR/CSS2/text.html#ctrlchars) which says that they render
as usual.

The CSS3 text behaviour seems like a bad idea to me, because

a) it conflicts with Unicode, and
b) legacy Windows encodings use C1 code points (in the range 0x80 - 0x9F)
for real characters; if a page using eg Windows-1252 encoding is
mislabelled as ISO-8859-1 (which can definitely happen) then all the code
points in this range would be silently be ignored rather than showing up as
boxes.

WDYT?
>

I think the default should be to do what Unicode says.  Also ask the CSS3
text folks why they are proposing this handling of Cc.

James
_______________________________________________
HarfBuzz mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to