On 16/04/2014 07:48, Torsten Bögershausen wrote:
On 15.04.14 21:10, Peter Krefting wrote:
diff --git a/utf8.c b/utf8.c
index a831d50..77c28d4 100644
Is there a script that generates this code from the Unicode database files, or
did you hand-update it?
Some of the code points which have "0 length on the display" are called
"combining", others are called "vowels" or "accents".
E.g. 5BF is not marked any of them, but if you look at the glyph, it should
be combining (please correct me if that is wrong).
Indeed it is combining (more specifically it has General Category
"Nonspacing_Mark" = "Mn").
If I could have found a file which indicates for each code point, what it
is, I could write a script.
The most complete and machine-readable data are in these files:
The general categories can also be seen more legibly in:
For docs, see:
The existing utf8.c comments describe the attributes being selected from
the tables (general categories "Cf","Mn","Me", East Asian Width "W",
"F"). And they suggest that the combining character table was originally
auto-generated from UnicodeData.txt with a "uniset" tool. Presumably this?
The fullwidth-checking code looks like it was done by hand, although
apparently uniset can process EastAsianWidth.txt.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html