[bug #58930] take baby steps toward Unicode

Dave Wed, 19 Aug 2020 22:23:40 -0700

Follow-up Comment #8, bug #58930 (project groff):

[comment #2 comment #2:]
> Unicode considers U+2009 THIN SPACE and U+200A HAIR SPACE breakable...
> Groff... does not offer breaking versions of these spaces, and the only
> reason to add them would be strict compliance with a Unicode property
> that probably no one who uses those code points actually wants


I believe my reasoning here was inaccurate.  Although Unicode _allows_
breaking at a thin space or hair space, it does not _require_ it,* so groff
declining to treat these as break points does not violate Unicode compliance
at all.  Thus I now propose that U+2009 THIN SPACE be mapped to groff's
(nonbreaking) \|, and U+200A HAIR SPACE to groff's (nonbreaking) \^.

* The gory details: Unicode line breaking is covered in "Unicode Standard
Annex #14: Unicode Line Breaking Algorithm"
(http://www.unicode.org/reports/tr14/tr14-45.html), whose introductory section
makes its scope clear: "Given an input text, [this algorithm] produces a set
of positions called 'break opportunities' that are appropriate points to begin
a new line. The selection of actual line break positions from the set of break
opportunities is not covered by the Unicode Line Breaking Algorithm, but is in
the domain of higher level software."  Groff declining to break at points that
Unicode specifies as "break opportunities" is perfectly in line with this.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?58930>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/

[bug #58930] take baby steps toward Unicode

Reply via email to