Re: Project bloat report

Patrice Dumas Wed, 04 Mar 2026 06:42:21 -0800

On Tue, Mar 03, 2026 at 09:16:50PM +0000, Gavin Smith wrote:
> Readers may have had the experience themselves of attempting to work on
> a code base that appears to have been abandoned by its original developers,
> and failing to make headway on it in any reasonable time.
> 
> The Texinfo project is not dead at the moment but it may still get more
> and more bloated over time.
> 
> Hence, I have been keeping a list of aspects of the project that could be
> considered bloat.


This is a very welcomed idea, and very relevant to GNU/Texinfo, as there
has been some increase of complexity, probably some bloat and few
developers.  Bloat is not such an issue when there is a community of
developpers, which is not the case for GNU/Texinfo.  We have a lovely
community of users, though, which is very nice.

I will only comment on the parts I am knowledgable about in the
following.

> * The pod2texi program.  I suppose the purpose of this program is clear:
>   it converts POD to Texinfo.  I've never used it and never touched the
>   sources.  It does not seem to be getting in the way of other development,
>   so can probably be left alone.

I would say that the pod2texi program is not getting much in the way of
other developments, but some tree transformations were specifically
done for pod2texi, as they were necessary there.  There is no specific
complexity nor maintainance burden associated, though.

> * texi2any.  This is the component of the code that sees the most
>   - It has two versions for much of its code - C and Perl.  This has
>     been discussed previously.  Needless to say, this more than doubles
>     the maintenance burden.

My current view is that in the long term, we should try to have an
implementation as complete as possible in C, with some well selected
interfaces like the SWIG API and use higher level languages in a more
limited fashion.

>   - Features of texi2any that could be considered bloat:
> 
>      * The XML format output with --xml.  This is not actually useful
>        for anything.
>        The Texinfo manual suggests that users might want to use this as a
>        starting point for conversion to other formats, but I'd actually
>        rather they didn't, as it means that we have to maintain the XML
>        converter, and would be better to have the converters built into
>        texi2any like the other output formats.
>        It's not a huge maintenance burden, except remembering to update
>        the DTD when the language changes, and the time spent running
>        the tests for XML output, of course.

I fully agree.  The XML output may have been relevant in the past, but
now it is not, and if something similar was to be done nowadays, I think
that it should be done from scratch and be more in line with the current
tree.  Or the SWIG interface should be used instead.  I would back up a
phasing out of the XML (and associated SXML), and migration of the code
to Example, with removal of --xml option, and removal from the Texinfo
manual.  It could stay in Example for more time, though, 

DocBook also seems to be much less used nowadays.  Probably still
relevant.  These days reStructuredText seems to be the language similar
to Texinfo that is in the fashion.  Not sure that a reStructuredText
output would be relevant, though, as fashion come and go.

>      * The IXIN output format.  This format is not likely to see any
>        further development, especially following the sad death of
>        its creator Thien-Thi Nguyen.  This may not be an issue as the
>        IXIN conversion code is now specificially in an "Example"
>        subdirectory.

I do not think that this is a real issue, this format was never really
used nor documented, it is abandonned since long.  The reason why it is
still in the source is that it could be useful as an example if
something similar was needed in the future, with a distinction of the
format used for the tree conversion and the overall conversion.  I do
not see it as problematic bloat, it does not need to be maintained.

>      * In HTML output, the option to create a special "About" page
>        with the DO_ABOUT variable does not seem useful.

It is a remnant of texi2html, it is probably not used much if at all.
That being said, it is not a real maintainance burden either, there is
nothing complex about it.

>      * The SORT_ELEMENT_COUNT variable does not seem useful:
> 
>          If set, the name of a file to which a list of elements (nodes or
>          sections, depending on the output format) is dumped, sorted by the
>          number of lines they contain after removal of @-commands; default
>          unset.  This is used by the program ‘texi-elements-by-size’ in the
>          ‘util/’ directory of the Texinfo source distribution (*note
>          texi-elements-by-size::).

I do not know if this is used, indeed.  I would not oppose removing as
this feature adds some complexity.

>      * The converter spends time and uses memory building "source marks"
>        with details of expanded macros and included source files.  This
>        information is not used in the output conversion, unless converting
>        back to the Texinfo source it started with.  It's possible that
>        some of this processing could be made optional for efficiency
>        (I haven't investigated in detail how this could be accomplished.)

I think that this feature is important.  I investigated whether it could
be removed, and my conclusion is that it would add more complexity and
would not make any significant change in efficiency.

>      * The Texinfo::Reader interface (new in 7.3).  tta/README in the
>        Texinfo sources explains:
> 
> > The modules in perl/Texinfo/Example are not developped anymore.  Docbook
> > conversion modules in this directory were developped using an interface
> > consisting of Texinfo::Reader, Texinfo::TreeElement and
> > Texinfo::Example::TreeElementConverter as a proof of concept.  However, this
> > interface proved to be too slow in Perl and difficult to implement with XS
> > code.  The Reader and TreeElement interface (except for one function) are 
> > not
> > used from Perl anymore.  Going forward, the SWIG interface based on the
> > Reader, Parser, Structuring and Texinfo Document C codes should
> > be used.  The SWIG interface is in the swig directory.  Texinfo::Reader and
> > Texinfo::TreeElement (except for the 'new' function) should not be used
> > anymore.

This is indeed a failed experiment.  I am in favor of removing completly
from the code the most complex parts, that are already mainly in
Example.  But I think that it is preferrable to keep Texinfo::Reader
(pure Perl version only, no XS) in the code, even if not used, as the
code is very simple and the corresponding C code exists and is used for
the SWIG interface.  Texinfo::TreeElement could probably be removed both
from Perl and the XS interfaces (associated C code too, but there is
probably almost nothing in C as there is a direct access to the element
structure).

>      * The --transliterate-file-names feature.  This feature was only just
>        turned off by default in the recent release.  It entails bundling
>        the Text::Unidecode Perl module with Texinfo, which although is only
>        1.3MB when extracted, bloats the directory listing (e.g. the output
>        of "tar tf") with 289 files - you may or may not agree that this
>        is a major problem.

Not a major problem to me.  Transliteration is used in other places than
for --transliterate-file-names although it could easily be modified in
order not to use transliteration.  The only complexity around
transliteration I see is that Perl and each C iconv implementations do
not give the same result, which can be inconvenient at times.

>      * The Unicode::CollateStub replacement in perl/Texinfo/Indices.pm
>        is only needed on Red Hat-like systems where Unicode::Collate is not
>        installed.  Recently with pre-release testing, this code had a problem
>        which we fixed.  It isn't tested regularly.

I agree that this and several other similar portability or
reproducibility fixes are annoying.  It adds code that we do not
routinely test, and can even be difficult to test.  In this category
there is --enable-xs-perl-libintl, building without a working iconv, the
workaround for different transliterations support in different iconv,
the support for old Perl releases and I probably forgot other similar
issues.

>     The SWIG interface, in providing an API for more programming languages,
>     seems to make the potential problems with API stability worse.

The SWIG interface definitively adds some complexity, and, maybe more
importantly, sets in stone the need for C code that the SWIG interface
is based on.  But I think that it is really needed, this allows to have
different project that need to parse Texinfo (for other purposes than
conversion to another format) to use the same 'authoritative' and
well maintained parser, document analysis, ... code.

>   - The texi2any API is major source of potential bloat.
>
>     As far as I know there are only two or three projects which use the
>     texi2any API (Lilypond, ffmpeg).  It seems every release there are
>     changes to the API which needs fixes in these other packages.
> 
>     For example, after the Texinfo 7.2 release, the ffmpeg build broke:
> 
>     
> https://www.linuxquestions.org/questions/slackware-14/texinfo-7-2-looks-to-have-broken-texinfo-convert-html-4175745581/
> 
>     I wrote at the time:
> 
>     > Such breakages seem inevitable as extension code could rely on many
>     > details of internal texi2any code.  The new version of Texinfo is
>     > then flagged as responsible for breaking compatibility.
>     > 
>     > This only stays manageable as long as the number of packages relying
>     > on the Perl customization API stays low.
> 
>     If more packages start using the texi2any API, it will be a further
>     source of breakage and even more work to go and find these packages
>     to fix their customization code when a new release is made.

I agree on those points, if more packages use the texi2any HTML
customization API, it will be difficult to have the API change when
needed, which is often.  That being said, I think that this API
corresponds to a well recognized user need and I do not have any other
idea.  It is worth adding that this API requires some Perl/C
interactions that I manage well, but could be an hurdle for
future maintainers.

>     On a lesser scale of problem, the API documentation takes quite a long
>     time to build and upload to the GNU website when doing a new release
>     because there is so much of it.

I think that it is the internal API documentation you are describing
here.  The texi2any HTML customization API is only in texi2any_api.texi.

That being said, part of the internal API is actually used in texi2any
HTML customization codes, and there is also the Texinfo tree
documentation.  I do not view this as an important issue.  Having this
documentation is interesting for myself in any case.  It could make
sense, however, to have only parts public and reorganize, with possibly
a separate manual for the tree.  I have TODO items on that.

>     Instead of promoting and expanding the use of API programming facilities,
>     I think it would be better to find out what users were using the API for
>     and design built-in features for supporting what they want to do.

I used the SWIG API for a po4a parser, and I can't see how this could be
done efficiently differently.  There is already a po4a Texinfo parser
based on the TeX po4a parser (another testimony of the Texinfo TeX /
texi2any duality for Texinfo), but I think that the po4a parser based on
the SWIG API is superior, in particular thanks to the source marks.  I
do not think built-in features could be used for that.

Regarding HTML API customization, I do not think that it is really
possible to do the same using built-in features.  There are already too
many customization variables for HTML, having hooks to replace
formatting functions seems to me to be a good compromise.


Something you did not describe, and, in my opinion, could be simplified
is the possibility to have XS extensions loaded at different steps.
It is relevant when the XS extensions have just been done, but in the
long run, I think that we should only have two possibilities, pure Perl
or all the XS extensions + Perl for the remaining.  I think that the
fine grained level of control permitted by the TEXINFO_XS_PARSER,
TEXINFO_XS_STRUCTURE and TEXINFO_XS_CONVERT variables could be removed
in the long-term.

-- 
Pat

Re: Project bloat report

Reply via email to