Re: Standalone Info reader cannot read Info files with CR-LF EOLs

Gavin Smith Fri, 26 Dec 2014 08:48:17 -0800

On Thu, Dec 25, 2014 at 3:51 PM, Eli Zaretskii <[email protected]> wrote:
> Today I discovered that the Info reader built from the current trunk
> cannot display any Info file that was produced natively on Windows (as
> opposed to Info files that come from distribution tarballs, which were
> produced on Unix).  The reader says it cannot find the Top node in any
> such Info file.
>
> It turned out this is because the code which stripped CR characters
> from CR-LF pairs, once the file was read, was #ifdef'ed away (in
> revision 5888), evidently due to a failure of a test that checks node
> accessibility through tag tables without the 1000-character slack.
>
> (I didn't find in bug-texinfo any discussion of the original problem
> or the change that was made to solve it.  Neither do I see anything
> pertinent in the bug database.  Did I miss something?  What or who
> triggered that change?)


I discovered this problem with the "gnucobpg.info" file that is part
of GNU Cobol (downloadable at
http://opencobol.add1tocobol.com/guides/), which has many CR-LF line
endings (but not consistently). I don't know exactly how this file was
generated - the file preamble says

This is gnucobpg.info, produced by makeinfo version 4.8 from
gnucobpg.texi.

- anyway, I had the problem mentioned that I found I couldn't access
later nodes in the file. I tested just now with info 4.13 and wasn't
able to access the "Alphabet-Name-Clause" or anything later in the
file. That's the only Info file I remember encountering containing
many CR bytes.

Since this claims to be produced by the 4.8 version (not 5.x)  whether
the CR characters are counted in the tag table must depend on other,
unknown factors.

(It could be helpful to make the GNU Cobol developers aware of this. I
haven't been able to quickly find an email address for them - if
anyone knows could they let them know?)

>  . fix texi2any to produce tag tables that assume the CR characters
>    are stripped from the Info file (my reading of the code is that it
>    should not count CR characters before LF for the purposes of
>    count_context value; or maybe it should simply open the Info output
>    in 'unix' mode)

The tag table containing the exact byte offsets is a lot simpler than
having to remove all of the CR characters (or just CR characters
before LF), and therefore less prone to incorrect implementation by
any other Info-reading or -writing programs that might be written. It
enables accessing the correct place in the file without processing the
entire file first. This could enable faster access of nodes by
memory-mapping a file. Most of the time speed isn't an issue, but it's
an idea I've had for speeding up searching the indices of all
installed Info files at once. It could also be used to access a node
of an Info file over a slow or expensive network connection without
having to download the entire file.

I hope it's possible to make changes to the standalon Info reader to
make it possible to access files with CR-LF line endings without
having to interpret the tag table this way. At the same time, if it's
easy to avoid outputting files with CR-LF line endings under Windows,
then I think we should do so.

Re: Standalone Info reader cannot read Info files with CR-LF EOLs

Reply via email to