Re: role of ctexi2any

pertusus Tue, 27 Jan 2026 14:03:49 -0800

On Tue, Jan 27, 2026 at 07:55:31PM +0000, Gavin Smith wrote:
> Not all the converters are available in C.  The plaintext/Info converter
> is still Perl only, as are (less importantly) the LaTeX and DocBook 
> converters.
> There are also unimportant converters to: a format called "IXIN", Texinfo XML,
> and also variants of these that use "SXML".
> 
> I think if the plaintext converter were rewritten in C, and ctexi2any
> could run without Perl extensions, with most of its functionality intact,
> we should consider making ctexi2any the default.
> 
> The index sorting is the other major blocker AFAIK, althought not a very
> visible issue for users.
> 
> I could probably make progress on implementing the Unicode collation
> algorithm and rewriting the Info converter in C after the release, if there
> were no further major changes to the structure of texi2any.  (However, I
> have not been able to make much progress on texi2any development recently.)


There is another part of (c)texi2any that would still need Perl, it is
the HTML customization that is in Perl.  I do not know if that can be
changed nor if it would be a good idea to change it.  Also the LaTeX
converter is used in HTML if CONVERT_TO_LATEX_IN_MATH is set.

> It has been an advantage to have a Perl version of the code to work on
> as it is quicker to make and test changes in an interpreted language; however,
> overall it is probably better just to have one language.  Any new contributors
> could be turned off by having to make equivalent changes in both Perl and
> C, and there is always the risk of divergence between the two parts of the
> code.

I do not think that it is a real issue.  We can always decide at some
point that we drop one of the implementation for the duplicated parts.

> It's clearly simpler to have a program written in one language than written
> in two languages.
> 
> For Perl embedding C, we have to make sure the C code is compiled with
> appropriate flags and that we can find and load it.  Likewise for C
> embedding Perl, we have to be able to embed a Perl interpreter.
> 
> There is interface code to call Perl code from C code and vice versa.
> 
> In some cases, not everything can be done in one language so calls
> have to be made to the other language to carry it out (e.g. Unicode
> collation).  (You used the word "imbrication" to describe the interface,
> which is not a common word in English, but as far as I can gather refers
> to the way that layers connect with each other with each layer extending
> physically into the other.)
> 
> As I understand it, much of the C code in texi2any can be run both by
> XS extension modules, and from ctexi2any.  This is achieved through the
> use of dynamically loaded libraries.

This is a bit more complex than that for the HTML converter, as,
depending on the needs, either the C code in the libraries is called
directly, or the embedded Perl HTML converter module is run, and the C
libraries are used through XS.  The embedded Perl HTML converter module
is run if HTML customization is used or the LaTeX converter may be
called with CONVERT_TO_LATEX_IN_MATH set.

> Dynamically loaded libraries (built with libtool) are somewhat slower
> to compile than straight compilation of *.c source files to *.o files.
> I feel that it would be easier to understand a C program made up of *.o
> files than one made up of *.la files that are dynamically linked together.
> 
> It's worth comparing with texinfo-7.1.1 (September 2024), which had
> fewer XS modules (it only had three - Parsetexi, XSParagraph and MiscXS),
> and no helper libraries.

The comparison is not completly fair, because in texinfo-7.1.1 there were
some Perl C code built with Gnulib+autoconf headers that were problematic.

> After extracting a tarball:
> 
> $ pwd
> /home/g/src/texinfo/texinfo-7.1.1
> $ du --si -d0 .
> 109M    .
> $ time ./configure
> ...
> real    0m34.932s
> user    0m18.840s
> sys     0m16.315s
> $ time make
> ...
> real    1m4.698s
> user    0m48.583s
> sys     0m13.122s
> $ du --si -d0 .
> 128M    .
> 
> For comparison, texinfo-7.2.90 has 11 XS modules (ConvertXS, MiscXS,
> Parsetexi, IndicesXS, ConfigXS, StructuringTransfoXS, DocumentXS,
> TreeElementXS, ReaderXS, TreeElementConverterXS, XSParagraph), and
> 10 helper libraries (libparagraph.la, libtexinfo-convert.la,
> libcallperl_libtexinfo_convert.la, libcallperl_libtexinfo.la,
> libtexinfo.la, libtexinfo-main.la, libperlcall_utils.la,
> libtexinfo-convertxs.la, libperlembed_libtexinfo_main.la,
> libtexinfoxs.la).
> 
> $ pwd
> /home/g/src/texinfo/texinfo-7.2.90
> $ du --si -d0 .
> 105M    .
> $ time ./configure
> ...
> real    0m54.267s
> user    0m28.575s
> sys     0m24.516s
> $ time make
> real    1m33.046s
> user    1m5.409s
> sys     0m25.713s
> $ du --si -d0 .
> 134M    .
> 
> The build time is roughly 50% longer.
> 
> texinfo-7.2.90 is smaller when extracted (which I think is due
> to a more compact format for test results), but the increase after
> building changes from 17 MB to 29 MB.  That is not a problem in itself
> but gives some indication of the increased size and complexity of the
> package. 
> 
> This may be optimal at present, but at some point I'd like to see the
> package having less size and complexity, once we start slouging off
> duplicative code in Perl.

We can do that, but again, I think that we shouldn't unless there is a
shortage of willingness to maintain the two.

> The main use case I am concerned about is when XS modules are disabled.
> Even if --disable-perl-xs is given to "./configure", a large amount of time
> is spent building dynamically linked libraries under the tta/ directory
> (including gnulib).  This is pointless for users who are only going to
> use the pure Perl implementations.  Is there no way to disable the C code
> being built under tta?

Maybe if disable XS is set and --enable-additional-checks is not passed
we would not recurse in tta/C?

As a side note, the SWIG python interface can be built with disable XS,
so it would need the same conditionals.

> The ctexi2any program itself is a small part of the picture, I expect, in
> comparison to all the other C code being compiled.

The big part is the HTML converter.

-- 
Pat

Re: role of ctexi2any

Reply via email to