Follow-up Comment #2, bug #58930 (project groff): [comment #1 comment #1:] > 1. "U+00A0 NO-BREAK SPACE > > None of these are equivalent to the others. :-/
"\~" and "\ " _shouldn't_ be equivalent; they're documented as behaving differently. The input string "\[u00A0]" being equivalent to neither of these is exactly the problem this plank of this bug report is looking to solve. It's only the character NO-BREAK SPACE in its Latin-1 form, which groff accepts as direct input, that groff recognizes and interprets as a nonbreaking space. groff_char(7) (which I only now thought to check) says it maps to \~. But that appears to be less than 100% accurate: $ LC_CTYPE=en_US.iso88591 printf ".if '\u00A0'\~' .tm equal\n" | groff $ But the upshot is, however groff interprets a Latin-1 A0, it really ought to interpret the form of that character emitted by preconv, \[u00A0], identically. > 2. The behavior of \: when used as the RHS of a .char request > does indeed seem a bit strange. Yeah, I really need to open a separate bug report for this, because it's unrelated to everything else here. > 3. Narrow no-break space. Have you named all of the non-breaking > spaces in Unicode in this ticket? No. I was intentionally trying to keep it simple and minimal. But it turns out there are only three: http://en.wikipedia.org/wiki/Whitespace_character#Unicode So the only one I didn't cover was U+2007 FIGURE SPACE, which should map to groff's (already nonbreaking) \0. > there are bunch of others (hair space, thin space, ideographic space, > ...) but I don't know what their breaking semantics are in Unicode. Irrational, IMO. Unicode considers U+2009 THIN SPACE and U+200A HAIR SPACE breakable, for no good reason that I can see. Groff (quite sensibly, since the concept is sort of absurd) does not offer breaking versions of these spaces, and the only reason to add them would be strict compliance with a Unicode property that probably no one who uses those code points actually wants: I can't think of a single real-world use case for a breaking thin space (though perhaps this is merely a failure of my imagination). This is all another can of worms I intentionally didn't address in what I intended to be a simple change. > 4. A non-breaking hyphen would then be something that looks > like \[hy] but doesn't actually break? Yes. > You can just use the character as-is in input. Ah, I guess you used -Tutf8 output, where that does work. (Somehow your groff command got stripped from your comment.) All other output formats (notably -Tps and -Tpdf) produce "warning: can't find special character 'u2011'". _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?58930> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
