Re: Not predefined Extended Latin character needed, interesting solution found

Ingo Schwarze Mon, 17 May 2021 06:36:00 -0700

Hi Oliver,

Oliver Corff wrote on Sat, May 15, 2021 at 11:39:31PM +0200:


> I try to use the correct abbreviation for the former Czechoslovak
> Socialist Republic, which is U+010C SSR (C + hacek, caron, wedge).
> The first attempt (enter Unicode 0x010C directly, leaving everything to
> preconv(1), did not work.

Works for me:

   $ printf '\xc4\x8cSSR' | mandoc
   $ printf '\xc4\x8cSSR' | groff -kT utf8

Both commands above produce the expected output for me (OpenBSD-current
with no fancy configuration changes, just using the default installation).

00000000  c4 8c 53 53 52 0a 0a 0a  0a 0a 0a 0a 0a 0a 0a 0a  |..SSR...........|
00000010  0a 0a 0a 0a 0a 0a 0a 0a  0a 0a 0a 0a 0a 0a 0a 0a  |................|


> Then I consulted groff_char(7) but there is no
> predefined \[vC], only \[vS] etc. for base letters s, S, z and Z. No C!
> I keep scratching my head.

Works for me:

   $ printf '\\[u010C]SSR' | mandoc
   $ printf '\\[u010C]SSR' | groff -T utf8

Both commands above produce the expected output; specifically:

   $ printf '\\[u010C]SSR' | groff -T utf8 | hexdump -C
  00000000 c4 8c 53 53 52 0a 0a 0a  0a 0a 0a 0a 0a 0a 0a 0a |..SSR...........|


> None of the other suggested notations (like \[u0043_030C] work (see
> groff(7)) out of the box.

Mandoc doesn't support that syntax, but with groff, even that works for me:

   $ printf '\\[u0043_030C]SSR' | mandoc -T lint
  mandoc: <stdin>:1:1: WARNING: invalid escape sequence: \[u0043_030C]
   $ printf '\\[u0043_030C]SSR' | groff -T utf8 | hexdump -C
  00000000 c4 8c 53 53 52 0a 0a 0a  0a 0a 0a 0a 0a 0a 0a 0a |..SSR...........|


> .AM

I don't think any fancy workarounds are needed.

Yours,
  Ingo

Re: Not predefined Extended Latin character needed, interesting solution found

Reply via email to