On Thu, Dec 25, 2014 at 3:51 PM, Eli Zaretskii <[email protected]> wrote: > Today I discovered that the Info reader built from the current trunk > cannot display any Info file that was produced natively on Windows (as > opposed to Info files that come from distribution tarballs, which were > produced on Unix). The reader says it cannot find the Top node in any > such Info file. > > It turned out this is because the code which stripped CR characters > from CR-LF pairs, once the file was read, was #ifdef'ed away (in > revision 5888), evidently due to a failure of a test that checks node > accessibility through tag tables without the 1000-character slack. > > (I didn't find in bug-texinfo any discussion of the original problem > or the change that was made to solve it. Neither do I see anything > pertinent in the bug database. Did I miss something? What or who > triggered that change?)
I discovered this problem with the "gnucobpg.info" file that is part of GNU Cobol (downloadable at http://opencobol.add1tocobol.com/guides/), which has many CR-LF line endings (but not consistently). I don't know exactly how this file was generated - the file preamble says This is gnucobpg.info, produced by makeinfo version 4.8 from gnucobpg.texi. - anyway, I had the problem mentioned that I found I couldn't access later nodes in the file. I tested just now with info 4.13 and wasn't able to access the "Alphabet-Name-Clause" or anything later in the file. That's the only Info file I remember encountering containing many CR bytes. Since this claims to be produced by the 4.8 version (not 5.x) whether the CR characters are counted in the tag table must depend on other, unknown factors. (It could be helpful to make the GNU Cobol developers aware of this. I haven't been able to quickly find an email address for them - if anyone knows could they let them know?) > . fix texi2any to produce tag tables that assume the CR characters > are stripped from the Info file (my reading of the code is that it > should not count CR characters before LF for the purposes of > count_context value; or maybe it should simply open the Info output > in 'unix' mode) The tag table containing the exact byte offsets is a lot simpler than having to remove all of the CR characters (or just CR characters before LF), and therefore less prone to incorrect implementation by any other Info-reading or -writing programs that might be written. It enables accessing the correct place in the file without processing the entire file first. This could enable faster access of nodes by memory-mapping a file. Most of the time speed isn't an issue, but it's an idea I've had for speeding up searching the indices of all installed Info files at once. It could also be used to access a node of an Info file over a slow or expensive network connection without having to download the entire file. I hope it's possible to make changes to the standalon Info reader to make it possible to access files with CR-LF line endings without having to interpret the tag table this way. At the same time, if it's easy to avoid outputting files with CR-LF line endings under Windows, then I think we should do so.
