[following up on this again after a year]

At 2024-08-21T18:15:58-0500, G. Branden Robinson wrote:
> At 2024-08-21T17:22:02-0400, Peter Schaffter wrote:
> > Processing two files from mom/examples produces error messages.
> > The groff version is 1.23.0.1757-8fe01 (latest sources from the repo).
> > 
> > mom-pdf.mom spits out
> >   troff:mom-pdf.mom:20: error: cannot write a node to device-independent 
> > output
> > 
> > mon-premier-doc spits out
> >   troff:mon_premier_doc.mom:13: error: cannot translate character code 233
> >   to special character ''e' in device-independent output
> > 
> > In both cases, the .AUTHOR macro generates the error.  In mom-pdf,
> > the second arg to .AUTHOR contains an escape sequence; in
> > mon-premier-doc, the author's name contains é (e-acute).
> > 
> > I haven't seen these messages for a while, so I was a bit surprised.
> 
> Hi Peter,
> 
> I brought these back on purpose, albeit with altered wording.
> 
> commit 6fd27d5d0b69985ae5f8a68b8fa058d82ad9d233
> Author: G. Branden Robinson <[email protected]>
> Date:   Tue Aug 13 05:45:57 2024 -0500
> 
>     [troff]: Drop GROFF_ENABLE_TRANSPARENCY_WARNINGS.
> 
>     * src/roff/troff/div.cpp (top_level_diversion::transparent_output):
>     * src/roff/troff/input.cpp (transparent_translate): Drop
>       `GROFF_ENABLE_TRANSPARENCY_WARNINGS` environment variable kludge.  The
>       underlying problems are better understood now and giving the user
>       tools to fix them is on the horizon.
> 
>     See <https://savannah.gnu.org/bugs/?63074>.
> 
> The reason is that they are flagging a real problem--one that has
> taken me most of my 7 years of contributing to groff to come to
> understand.
> 
> I started writing a lengthy explanation here, and got really far along
> before realizing that my understanding still is not yet total.
[...]
> Will advise when I know more.

I'm closer to that understanding now, thanks to some prompting from
Deri in Savannah #66653.[1]

Some changes under development in my working copy do the following.

1.  Make it possible in many cases for the `asciify` request to recover
    Unicode special character escape sequences in `\[uXXXX]` syntax from
    diversions.

2.  Explicitly discard more node types from restoration of
    `asciification`.

(Incidentally, I'd like to rename the `asciify` request.  But we can
grapple with that after the 1.24 release.[2])

Here's an update of the `asciify` request description.  (This material
also appears in groff(7) and our Texinfo manual.)

groff_diff(7):
     .asciify div
             Undivert the diversion div in such a way that Unicode
             characters, characters translated with the trin request,
             spaces, and some escape sequences that were formatted and
             diverted into div approximately recover their input forms.
             (The current escape character is used, characters outside
             the Unicode Basic Latin and Latin‐1 Supplement range are
             represented as “\[uXXXX]” escape sequences, and the code
             points themselves may change due to internal normalization.
             If the escape character is disabled via the eo request when
             undiversion is attempted, GNU troff reports an error and
             discards parts of the diversion that require an escape
             sequence to represent.)

             asciify cannot return all nodes in a diversion to their
             source equivalents: those produced by indexed characters
             (\N), for example, remain nodes, so the result cannot be
             guaranteed to be a character sequence as a macro or string
             is.  Give the diversion name as an argument to the pm
             request to inspect its contents and node list.  Glyph
             parameters such as the type face and size are not
             preserved; use “unformat” to achieve that.

Also, I see ways to make diversion more of a reliably reversible
process.  I don't think it can ever be perfect due to language issues--
we don't store or know what the escape character was at the time input
was diverted, and any units expressed will irreversibly become basic
units `u`.  But it doesn't seem impossible to me at this point that we
could divert and undivert willy-nilly up to[3] `grout` transformation.

Regards,
Branden

[1] https://savannah.gnu.org/bugs/?66653
[2] https://savannah.gnu.org/bugs/?67472
[3] in the mathematical sense, like "up to isomorphism"

Attachment: signature.asc
Description: PGP signature

Reply via email to