On Tue, Mar 10, 2026 at 08:57:59PM +0000, Werner LEMBERG wrote:
> 
> >> aaaaaaaaaa@c
> >>             bbbbbbbbbbb
> >> 
> >> This should be output in Info as "aaaaaaaaaabbbbbbbbbbb" to match the
> >> TeX output, 
> > 
> > This looks both wrong and unexpected to me.
> 
> TeX's algorithm is as follows.
> 
> 1. Remove leading spaces in a line.
> 2. Remove trailing spaces in a line.
> 3. Condense all sequences of spaces in a line to a single spaces.
> 4. Convert a newline character (if not followed by another one) to
>    space.  Exception: If the line ends with a comment character, omit
>    the newline character.
> 
> I think this is a clean solution, and it is the behaviour of
> `texinfo.tex`.  What exactly do you consider as 'wrong and
> unexpected'?
> 
> 
>     Werner

As far as I understand, TeX's behaviour is close to what you describe,
but it is not the full story.  For example, it's possible for TeX's
conception of what a space character is to change in the middle of
a line.  (This would not be relevant for Texinfo in many places.  It
would be relevant anywhere multiple spaces are not collapsed to one:
possibly inside @example or @verb.)

@c is not a "comment character" as TeX understands in (% in plain TeX),
but a control sequence.  In Texinfo, DEL (hex 7f) is a comment character,
but this is not used by anybody as far as I know.

The first stage of TeX's input processing is concisely explained in
section 2.5 of "TeX by Topic" by Victor Eijkhout:

    2.5 The input processor as a finite state automaton
    
    TEX’s input processor can be considered to be a finite state automaton
    with three internal states, that is, at any moment in time it is in one
    of three states, and after transition to another state there is no memory
    of the previous states.
    
    2.5.1 State N: new line
    State N is entered at the beginning of each new input line, and that is
    the only time TEX is in this state. In state N all space tokens (that
    is, characters of category 10) are ignored; an end-of-line character is
    converted into a \par token. All other tokens bring TEX into state M.
    
    2.5.2 State S: skipping spaces
    State S is entered in any mode after a control word or control space (but
    after no other control symbol), or, when in state M, after a space. In
    this state all subsequent spaces or end-of-line characters in this input
    line are discarded.
    
    2.5.3 State M: middle of line
    By far the most common state is M, ‘middle of line’. It is entered
    after characters of categories 1–4, 6–8, and 11–13, and after
    control symbols other than control space. An end-of-line character
    encountered in this state results in a space token.

https://www.eijkhout.net/tex/tex-by-topic.html

Reply via email to