Got it, thanks! On Mon, Aug 10, 2020 at 8:24 AM Deri <[email protected]> wrote:
> On Sunday, 9 August 2020 05:58:15 BST T. Kurt Bond wrote: > > > Anyway, in the output file (attached to this e-mail) the unicode > > > characters show up fine in the body text fine, but in the PDF Outline the > > > characters show up as [uXXXX] text instead of the actual character. Does > > > anybody know why this is? I know that if I do something similar for > > > Heirloom troff the PDF Outline *does* contain the Unicode characters. > > > > In the PDF Reference text strings are defined as:- > > > > > ============================================================================= > > > > 3.8.1 Text Strings > > > > Certain strings contain information that is intended to be human-readable, > such > > as text annotations, bookmark names, article names, document information, > and > > so forth. Such strings are referred to as text strings. Text strings are > encoded in > > either PDFDocEncoding or Unicode character encoding. PDFDocEncoding is a > > superset of the ISO Latin 1 encoding and is documented in Appendix D. > Unicode > > is described in the Unicode Standard by the Unicode Consortium (see the > Bibli- > > ography). > > > > For text strings encoded in Unicode, the first two bytes must be 254 > followed by > > 255, representing the Unicode byte order marker, U+FEFF . (This sequence > con- > > flicts with the PDFDocEncoding character sequence thorn ydieresis, which > is un- > > likely to be a meaningful beginning of a word or phrase.) The remainder of > the > > string consists of Unicode character codes, according to the UTF-16 > encoding > > specified in the Unicode standard, version 2.0. Commonly used Unicode > values > > are represented as 2 bytes per character, with the high-order byte > appearing first > > in the string. > > > > > ============================================================================== > > > > Since groff works internally with ascii, the \[uXXXX] form of input is > converted to a separate node which is a named glyph in the appropriate > font. In the groff_out format this can be seen as "Cu2640", for example, > which tells the output driver to look for the named glyph in a particular > font. > > > > This is only true for text which is destined for the output stream, > parameters to .pdfhref are just treated as ascii, i.e PDFDocEncoding. > > > > Cheers > > > > Deri > > > -- T. Kurt Bond, [email protected], https://tkurtbond.github.io
