Re: role of ctexi2any

Gavin Smith Tue, 27 Jan 2026 11:56:22 -0800

On Tue, Jan 27, 2026 at 12:25:47AM +0100, [email protected] wrote:
> > I do not understand the value to the user of ctexi2any, as it is.
> 
> Not much besides a choice of language for the main program.  Note that
> texi2any.pl does not have additional value compared to ctexi2any either.
> Both are equivalent, so the one that is not used hasn't much value.


Yes.

> I think that there is some value in building/testing the C
> implementation, as it makes it more easy to switch to that
> implementation as the main implementation, if we decide so, and users
> may be interested in that possibility.  An option value of some sort (in
> economic terms).
> 
> > It makes sense to me as part of a potential rewrite from texi2any from
> > Perl to C.  We've discussed in the past the idea of progressing texi2any
> > from a Perl program to a Perl program that embeds C, to a C program that
> > embeds Perl, to a pure C program with optional Perl extensions.
> > (Whether this will ever actually happen is unknown.)
> 
> I would already describe ctexi2any like that.  But maybe you refer to
> the fact that indices sorting needs some Perl code and reproducible
> tests need the use of Unidecode to do the transliteration, in which case
> I agree.

Not all the converters are available in C.  The plaintext/Info converter
is still Perl only, as are (less importantly) the LaTeX and DocBook converters.
There are also unimportant converters to: a format called "IXIN", Texinfo XML,
and also variants of these that use "SXML".

I think if the plaintext converter were rewritten in C, and ctexi2any
could run without Perl extensions, with most of its functionality intact,
we should consider making ctexi2any the default.

The index sorting is the other major blocker AFAIK, althought not a very
visible issue for users.

I could probably make progress on implementing the Unicode collation
algorithm and rewriting the Info converter in C after the release, if there
were no further major changes to the structure of texi2any.  (However, I
have not been able to make much progress on texi2any development recently.)

It has been an advantage to have a Perl version of the code to work on
as it is quicker to make and test changes in an interpreted language; however,
overall it is probably better just to have one language.  Any new contributors
could be turned off by having to make equivalent changes in both Perl and
C, and there is always the risk of divergence between the two parts of the
code.

> > This process
> > entails all the code being duplicated in both languages
> 
> It is not this process that leads to most of the duplication, most of
> the duplication is already needed to have a good XS coverage.

XS modules are part of the duplication I was referring to.

> The actual duplications are texi2any.c, some code in convert/converter.c
> and convert/texinfo.c, the whole of convert/html_converter_api.c,
> convert/plaintexinfo_converter_api.c and
> convert/rawtext_converter_api.c.
> 
> Then there are also additionally needed codes, namely the
> XSTexinfo/parser_document/ConfigXS.xs interface and everything related
> to Perl embedding, ie perl/load_txi_modules.pl, m4/txi_embedded_perl.m4,
> most of convert/call_conversion_perl.c, convert/call_embed_perl.c, some
> associated functions in Perl C codes and some code and functions here
> and there (for example in XSLoader).  There are probably also some
> functions that are only used right now in relation to ctexi2any, but
> that would need to be implemented anyway if some other converters were
> ported to C, which I wouldn't count as being duplicated for ctexi2any.
> 
> All in all, I do not think that there is that much additional code, but
> it is of course very subjective.
> 
> > and complex
> > interactions between the two parts of the program, but in theory could
> > lead to a simple result at the end.  In short, it's a bigger mess created
> > in the process of cleaning up a smaller mess.
> 
> It is not clear to me what "complex interactions between the two parts
> of the program" actually mean.  The code only needed by XS interfaces
> is, as far as I can tell, relatively well separated from the code needed
> for ctexi2any or the SWIG interface.  It is not fully separated, there
> are some cases where some choices are made differently depending on Perl
> being embedded or not, but it is quite rare.

It's clearly simpler to have a program written in one language than written
in two languages.

For Perl embedding C, we have to make sure the C code is compiled with
appropriate flags and that we can find and load it.  Likewise for C
embedding Perl, we have to be able to embed a Perl interpreter.

There is interface code to call Perl code from C code and vice versa.

In some cases, not everything can be done in one language so calls
have to be made to the other language to carry it out (e.g. Unicode
collation).  (You used the word "imbrication" to describe the interface,
which is not a common word in English, but as far as I can gather refers
to the way that layers connect with each other with each layer extending
physically into the other.)

As I understand it, much of the C code in texi2any can be run both by
XS extension modules, and from ctexi2any.  This is achieved through the
use of dynamically loaded libraries.

Dynamically loaded libraries (built with libtool) are somewhat slower
to compile than straight compilation of *.c source files to *.o files.
I feel that it would be easier to understand a C program made up of *.o
files than one made up of *.la files that are dynamically linked together.

It's worth comparing with texinfo-7.1.1 (September 2024), which had
fewer XS modules (it only had three - Parsetexi, XSParagraph and MiscXS),
and no helper libraries.

After extracting a tarball:

$ pwd
/home/g/src/texinfo/texinfo-7.1.1
$ du --si -d0 .
109M    .
$ time ./configure
...
real    0m34.932s
user    0m18.840s
sys     0m16.315s
$ time make
...
real    1m4.698s
user    0m48.583s
sys     0m13.122s
$ du --si -d0 .
128M    .

For comparison, texinfo-7.2.90 has 11 XS modules (ConvertXS, MiscXS,
Parsetexi, IndicesXS, ConfigXS, StructuringTransfoXS, DocumentXS,
TreeElementXS, ReaderXS, TreeElementConverterXS, XSParagraph), and
10 helper libraries (libparagraph.la, libtexinfo-convert.la,
libcallperl_libtexinfo_convert.la, libcallperl_libtexinfo.la,
libtexinfo.la, libtexinfo-main.la, libperlcall_utils.la,
libtexinfo-convertxs.la, libperlembed_libtexinfo_main.la,
libtexinfoxs.la).

$ pwd
/home/g/src/texinfo/texinfo-7.2.90
$ du --si -d0 .
105M    .
$ time ./configure
...
real    0m54.267s
user    0m28.575s
sys     0m24.516s
$ time make
real    1m33.046s
user    1m5.409s
sys     0m25.713s
$ du --si -d0 .
134M    .

The build time is roughly 50% longer.

texinfo-7.2.90 is smaller when extracted (which I think is due
to a more compact format for test results), but the increase after
building changes from 17 MB to 29 MB.  That is not a problem in itself
but gives some indication of the increased size and complexity of the
package. 

This may be optimal at present, but at some point I'd like to see the
package having less size and complexity, once we start slouging off
duplicative code in Perl.

> There is some complexity, for sure, in ctexi2any related code, linked to
> Perl embedding, and also because there is one more interface gone
> through when embedded Perl is actually used.  But this complexity is
> isolated from texi2any.pl.  And ctexi2any is simpler than texi2any.pl
> when only C is used for HTML, as the complexity added by the XS
> interfaces is not present.
> 
> > The mess is yet to be cleaned up.
> 
> I do not really get it.  I do not see anything obviously problematic
> with respect to some code needed for ctexi2any that would cause
> complexity for the code needed by texi2any.pl + XS.

I don't mean just ctexi2any.

> > The --enable-using-c-texi2any flag is an abuse of the configuration 
> > interface.
> > Node "Configuration" in the GNU Coding Standards:
> > 
> >        No ‘--enable’ option should *ever* cause one feature to replace
> >        another.  No ‘--enable’ option should ever substitute one useful
> >        behavior for another useful behavior.  The only proper use for
> >        ‘--enable’ is for questions of whether to build part of the program
> >        or exclude it.
> 
> The "should ever substitute one useful behavior for another useful
> behavior" is not that clearly relevant here as there is no difference in
> features, the difference is the implementation.  I agree, however, that
> it is not a "question of whether to build part of the program or exclude
> it".  But I am also pretty sure that these rules are to be followed with
> reason, and the situation seems to me to be specific enough and not
> covered by the rule to warrant an exception.
> 
> >   ...
> >   
> >      You will note that the categories ‘--with-’ and ‘--enable-’ are
> >   narrow: they *do not* provide a place for any sort of option you might
> >   think of.  That is deliberate.  We want to limit the possible
> >   configuration options in GNU software.  We do not want GNU programs to
> >   have idiosyncratic configuration options.
> 
> I do not really understand what it adds to the previously said
> information and the "We do not want GNU programs to have idiosyncratic
> configuration options" is confusing to me as it could even go against
> using any --enable options, but I am probably misinterpreting what is
> said.

I think the intention is that the configuration options should be easy
to understand for users and follow a simple, consistent schema.

> > So if we want to provide some way to use ctexi2any as the texi2any
> > implementation, I'd suggest we find some other way of doing it than
> > a configure option.  The only idea I have is to use an environment variable
> > instead.  texi2any.pl could detect this environment variable and delegate
> > to ctexi2any.
> 
> I do not find that idea very good, in my opinion, it would lead to
> unneeded complexity, possible confusion and would be overall less useful.
> I do not think that we should follow those rules to the point of
> choosing an inferior design.

I don't have a better idea at the moment.

> Overall, I am not convinced by the argumentation about --enable,
> because, as far as I can say (but I must say that those rules are not
> crystal clear to me), we use --enable for things other than "questions
> of whether to build part of the program or exclude it" for other
> options, for instance --enable-perl-xs, --enable-xs-perl-libintl,
> --enable-perl-install-mode.  Yet, those seem useful and relevant, and I
> wouldn't like to have them removed for the sake of following that rule
> which, unless I missed something, is trying to avoid something else.
> 
> > If we had an --enable-ctexi2any option this would control whether the
> > ctexi2any program is built and installed.  It does not seem of much
> > interest to users to build the ctexi2any program in the build tree only
> > and not install it.
> 
> I disagree on that.  Having ctexi2any built and not installed can be
> useful for testing and being able to understand and compare
> designs.  Whether this use is balanced by the risk of build failure, the
> increased build time and the waste of electricity and computer wear, I
> cannot tell, but it is definitely useful to have two implementations to
> compare.

The main use case I am concerned about is when XS modules are disabled.
Even if --disable-perl-xs is given to "./configure", a large amount of time
is spent building dynamically linked libraries under the tta/ directory
(including gnulib).  This is pointless for users who are only going to
use the pure Perl implementations.  Is there no way to disable the C code
being built under tta?

The ctexi2any program itself is a small part of the picture, I expect, in
comparison to all the other C code being compiled.

I'm not concerned about users using more electricity or wearing out the
silicon in their microchips, but I am concerned about the time they might
spend waiting for the package to finish building.

It makes a difference for development too.

Re: role of ctexi2any

Reply via email to