Re: Rewriting Plaintext/Info converter in C?

Patrice Dumas Tue, 23 Jun 2026 14:51:48 -0700

On Tue, Jun 23, 2026 at 10:00:16PM +0100, Gavin Smith wrote:
> I thought I would try to make a start on rewriting the Plaintext converter
> in C.  I thought it should be simple when compared with the HTML converter
> that already exists.  For one thing, there should be no customization API
> for the Plaintext converter.  All the conversion is done with straight
> function calls rather than via hooks that can be overridden.


Indeed, it is simpler in that respect.  The program flow, however, is
more complicated because of the need to cut the line correctly, and
remeber where the anchors are.

> I looked at HTML.pm to see if I could see how this module was implemented
> in C as part of the program.  I noticed there were two modules, one called
> HTML.pm, the other called HTMLNonXS.pm.

The HTMLNonXS.pm has all the functions that have an XS interface and
are not needed when the XS interface is loaded
(tta/perl/XSTexinfo/convert/ConvertHTMLXS.xs).  HTML.pm has the
functions do not have an XS interface, and in practice need to be
defined even when the XS interface is loaded.  There is no other logic
behind that organization, one should consider that the Perl module is
the union of the two files.  This organization makes managing the XS
interface much easier, and sometime can help finding if some functions
are better in the XS interface or not. 

> Compare HTML.pm and HTMLNonXS.pm with Texinfo/Convert/Paragraph.pm and
> Texinfo/Convert/ParagraphNonXS.pm.  Paragraph.pm is a very short module.
> Whatever the relationship is between HTML.pm and HTMLNonXS.pm, it is clearly
> different from the relationship between Paragraph.pm and ParagraphNonXS.pm.

That is because all the functions in Paragraph have an XS interface.
Another different case is Indices, in that case the longest file is
Indices.pm, because there are only few XS interfaces, the Perl functions
are mainly called from other modules.

> I thought I could have an easier time understanding the ctexi2any code
> to how a converter was defined.
> 
> There were a few things I found confusing.
> 
> texi2any.c (the file with 'main') calls a 'txi_converter_setup' function
> with an argument based on the output format.  This is in the file
> C/convert/texinfo.c.
> 
> I would have expected a file called "convert/texinfo.c" to be to do with
> converting to Texinfo as an output format (possibly useful for testing,
> or for expanding macros).  But texinfo.c appears to have code for a mixture
> of different things:
> 
>     /* Interface similar to the Perl modules interface for Texinfo parsing,
>        higher-level interface for document structure and transformations,
>        and interface similar to the Perl modules interface for conversion */
> 
> I don't get from this a clear idea of what this file is for.

The texinfo.c file is not at all related to convertion to Texinfo
(conversion to Texinfo is in main/convert_to_texinfo.c).  It is for code
never called from XS, better out of texi2any.c and not rightly into more
specific files such as converter.c or structuring.c.  Maybe the
organization could be ameliorated for this file, and the name could be
different.

> 'txi_converter_setup' calls 'converter_converter', defined in
> C/convert/converter.c.  This then refers to a data array 
> 'converter_format_data',
> defined in the same file:
> 
>     /* table used to dispatch format specific functions.
>        Same purpose as inherited methods in Texinfo::Convert::Converter */
>     /* Should be kept in sync with enum converter_format
>        and TXI_CONVERSION_FORMAT_NR */
>     CONVERTER_FORMAT_DATA converter_format_data[] = {
>       {"html", "Texinfo::Convert::HTML", &html_format_setup, 0,
>        &html_converter_defaults,
>        &html_converter_initialize, &html_output, &html_convert,
>        &html_convert_tree, 0, &html_free_converter, &html_element_cdt_tree},
>       {"rawtext", "Texinfo::Convert::Text", 0, &rawtext_converter,
>        0, 0, &rawtext_output,
>        &rawtext_convert, &rawtext_convert_tree, 0, 0, 0},
>       {"plaintexinfo", "Texinfo::Convert::PlainTexinfo", 0, 0,
>        &plaintexinfo_converter_defaults, 0, &plaintexinfo_output,
>        &plaintexinfo_convert, &plaintexinfo_convert_tree, 0, 0, 0},
>     };
> 
> This appears to have a similar purpose to the array in texi2any.c:

It is supposed to be a different purpose.  The converter_format_data is
almost only used to dispatch the different steps of conversion to per
output format functions.  It is about conversion code.  There is
information for converters that are fully implemented in C only.

The array in texi2any.c holds information on the different output
formats of different nature, such as the transformations done for the
output formats, the init file loaded, the name of Perl module or the
format actually converted to (HTML for EPUB for instance).

The converter_format_data is organized parallely to the Perl
Texinfo/Convert/Converter.pm Perl interface, the dispatch has the same
function as overriding a specific method by a module inheriting from
Texinfo::Convert::Converter.

> At the least, it appears to duplicate the association between format name
> ("html") and associated Perl module ("Texinfo::Convert::HTML"), although
> there is no module given for "rawtext" or "plaintexinfo" in the array in
> texi2any.c.  'converter_format_data' in converter.c appears only to have
> the output formats with C code available.

The format name in both cases is not exactly the same, in texi2any.c
it is the output format, in converter.c it is the name of the conversion
code (and it is not of much use, actually).

> So I suppose, if I were trying to write a Plaintext converter (and then
> an Info converter), I would start by adding an entry to
> 'converter_format_data' and then see what other changes were needed to
> surrounding code.

Indeed, you would add there all the functions that are needed for the
different steps of the conversion, probably the converter_output
function first.

> I hope this message is productive and gives a sense of my difficulty in
> approaching this code base.  My initial impression is that the handling
> of getting the functions to be used for conversion could do with some
> reorganization, with relevant code in three different source files
> (texi2any.c, texinfo.c and convert.c).

That is quite possible.  Note that one important element of the design
is to stick to a relevant organization with respect to the Perl code,
such that it is easy to do the XS interfaces, to compare output and
change both languages.  If the Perl code disapears, these constraints
cease to be.

> Like I've said in the past, I'm hopeful that this code will start to get
> simpler in the future rather than more complicated as more of it is written
> in a single language (C) and the need for cross-language interfacing
> infrastructure is reduced.

-- 
Pat

Re: Rewriting Plaintext/Info converter in C?

Reply via email to