> I do not really understand unicode. I will try to figure out which 
> unicode characters need special consideration, and then make up the 
> specs.

  In Unicode the most important dash-or-hyphen-like characters are:

    U+002D HYPHEN-MINUS (-): The “usual” ASCII character with an ambiguous
  meaning (hyphen? minus?);

    U+00AD SOFT HYPHEN (­): Indicate a line break opportunity, no visible glyph;

    U+2010 HYPHEN (‐): Carries the “hyphenation” meaning of
  hyphen-minus; preferred over the latter to indicate a visible hyphen;

    U+2011 NON-BREAKING HYPHEN (‑): Well ... a  hyphen, but non-breaking;

    U+2012 FIGURE DASH (‒): Same ambiguous meaning as hyphen-minus, but
  has the same width as digits;

    U+2013 EN DASH (–): Used to indicate ranges of values (1910–2007);
  the equivalent to TeX's “--” ligature;

    U+2014 EM DASH (—): Used to separate quotes—like this—; the
  equivalent to TeX's “---” ligature.

  The above is an extract of the “Dashes and hyphen” paragraph of
section 6.2 of the Unicode Standard 
(http://www.unicode.org/versions/Unicode5.0.0/ch06.pdf).
You might also want to look into the Unicode line breaking properties
for a complete description (http://www.unicode.org/reports/tr14/). I can
summarize that for you if you want.

        Arthur
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : [email protected] / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

Reply via email to