> I do not really understand unicode. I will try to figure out which
> unicode characters need special consideration, and then make up the
> specs.
In Unicode the most important dash-or-hyphen-like characters are:
U+002D HYPHEN-MINUS (-): The “usual” ASCII character with an ambiguous
meaning (hyphen? minus?);
U+00AD SOFT HYPHEN (): Indicate a line break opportunity, no visible glyph;
U+2010 HYPHEN (‐): Carries the “hyphenation” meaning of
hyphen-minus; preferred over the latter to indicate a visible hyphen;
U+2011 NON-BREAKING HYPHEN (‑): Well ... a hyphen, but non-breaking;
U+2012 FIGURE DASH (‒): Same ambiguous meaning as hyphen-minus, but
has the same width as digits;
U+2013 EN DASH (–): Used to indicate ranges of values (1910–2007);
the equivalent to TeX's “--” ligature;
U+2014 EM DASH (—): Used to separate quotes—like this—; the
equivalent to TeX's “---” ligature.
The above is an extract of the “Dashes and hyphen” paragraph of
section 6.2 of the Unicode Standard
(http://www.unicode.org/versions/Unicode5.0.0/ch06.pdf).
You might also want to look into the Unicode line breaking properties
for a complete description (http://www.unicode.org/reports/tr14/). I can
summarize that for you if you want.
Arthur
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the
Wiki!
maillist : [email protected] / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________