[docs-discuss] Developer Reference docbook conversion tool

Eric Lowe Tue, 23 May 2006 11:36:03 -0500

Hi,

I have successfully hacked together a tool which converts the Developer 
Reference POD source into DocBook XML. The resulting XML->HTML output is 
very similar to the stock POD->HTML output.


I started with the Pod::XML v0.93 perl module from CPAN and its included 
pod2xml tool. I had to hack up the perl module considerably so that it 
would correctly translate links, since the v0.93 version has half-baked 
link handling. I also modified the script so that it would produce DocBook 
tags instead of the funny "pod" format to avoid XSLT post-processing, 
added some code to strip the chapter numbers, and added recognition for 
appendices.

I had to make a few minor changes to the POD source (I'm working from 
v1.10) to get it to translate correctly. There are also a few broken 
reference links which I haven't tried to address, but which xsltproc flags.

26c26
< B<OpenSolaris Developer's Reference>
---
 > =head1 OpenSolaris Developer's Reference
829c829
< =head4 Example: Installing Studio 10 compilers, installed image
---
 > =head3 Example 1: Installing Studio 10 compilers, installed image
857c857
< =head4 Example: Installing Studio 10 compilers, full product
---
 > =head3 Example 2: Installing Studio 10 compilers, full product
907c907
< =head4 Example: Installing the ON build tools
---
 > =head3 Example 3: Installing the ON build tools

The diffs of my XML.pm versus the stock XML.pm are below. E-mail me 
privately if you want a copy of the intact XML.pm file.

Cheers
- Eric

----- >8 snip -----
1a2
 > # Modified by elowe at sun.com 2006-05-22 to generate DocBook XML
5c6
< use vars qw(@ISA $VERSION %head2sect %xmlchars %HTML_Escapes);
---
 > use vars qw(@ISA $VERSION %head2sect %head2altsect %xmlchars 
%HTML_Escapes);
10c11,13
< $VERSION = '0.93';
---
 > $VERSION = '0.94';
 >
 > my $inappendix = 0;
13,16c16,26
<     1 => "sect1",
<     2 => "sect2",
<     3 => "sect3",
<     4 => "sect4",
---
 >     1 => "chapter",
 >     2 => "sect1",
 >     3 => "sect2",
 >     4 => "sect3",
 > );
 >
 > %head2altsect = (
 >     1 => "appendix",
 >     2 => "sect1",
 >     3 => "sect2",
 >     4 => "sect3",
115a126,138
 > sub strip_numbering {
 >     my ($yadda) = @_;
 >     $yadda =~ /[\d\. ]*(.*)/;
 >     return $1;
 > }
 >
 > sub sect_to_link {
 >     my ($title) = @_;
 >     my $esctitle = "_" . substr($title, 0, 30);
 >     $esctitle =~ s/\W/_/g;
 >     return $esctitle;
 > }
 >
126c149,150
< <pod xmlns="http://axkit.org/ns/2000/pod2xml";>
---
 > <!DOCTYPE book SYSTEM "docbookx.dtd">
 > <book>
135c159,163
<         $parser->xml_output("</$head2sect{$parser->{headlevel}}>\n");
---
 >       if ($inappendix == 1) {
 > 
$parser->xml_output("</$head2altsect{$parser->{headlevel}}>\n");>       } 
else {
 >               $parser->xml_output("</$head2sect{$parser->{headlevel}}>\n");
 >       }
140c168
< </pod>
---
 > </book>
160c188
<             $parser->xml_output("<head>\n\t<title>");
---
 >             $parser->xml_output("<title>");
165c193
<             $parser->xml_output($paragraph, "</title>\n</head>\n");
---
 >             $parser->xml_output($paragraph, "</title>\n");
171c199,203
< 
$parser->xml_output("</$head2sect{$parser->{headlevel}}>\n");
---
 >               if ($inappendix == 1) {
 > 
$parser->xml_output("</$head2altsect{$parser->{headlevel}}>\n");
 >               } else {
 > 
$parser->xml_output("</$head2sect{$parser->{headlevel}}>\n");
 >               }
178c210,214
<               $parser->xml_output("<$head2sect{$parser->{headlevel}}>\n");
---
 >               if ($inappendix == 1) {
 > 
$parser->xml_output("<$head2altsect{$parser->{headlevel}}>\n");
 >               } else {
 > 
$parser->xml_output("<$head2sect{$parser->{headlevel}}>\n");
 >               }
182,183c218,234
<         $parser->xml_output("<$head2sect{$headlevel}>\n",
<                 "<title>", $paragraph, "</title>\n");
---
 >       my $title = $paragraph;
 >         if ($headlevel == 1) {        # DocBook does its own chapter 
numbering>               $title = strip_numbering($title);
 >               $inappendix = 0;
 >               if ($title =~ /Appendix/) {
 >                       $inappendix = 1;
 >                       $title =~ s/Appendix [A-Z|a-z]+[:.]?\s*//;
 >               }
 >       }
 >       my $esctitle = sect_to_link($paragraph);
 >
 >       my $tag = $head2sect{$headlevel};
 >       if ($inappendix == 1) {
 >               $tag = $head2altsect{$headlevel};
 >       }
 >         $parser->xml_output("<$tag id='$esctitle' xreflabel='$title'>\n",
 >                 "<title>", $title, "</title>\n");
187c238
<             $parser->xml_output("</item>\n");
---
 >             $parser->xml_output("</listitem>\n");
190c241
<         $parser->xml_output("<list>\n");
---
 >         $parser->xml_output("<itemizedlist>\n");
194c245
<             $parser->xml_output("</item>\n");
---
 >             $parser->xml_output("</listitem>\n");
197c248
<         $parser->xml_output("</list>\n");
---
 >         $parser->xml_output("</itemizedlist>\n");
201c252
<             $parser->xml_output("</item>\n");
---
 >             $parser->xml_output("</listitem>\n");
204c255
<         $parser->xml_output("<item>");
---
 >         $parser->xml_output("<listitem>");
207c258
<             $parser->xml_output("<itemtext>", $paragraph, "</itemtext>\n");
---
 >             $parser->xml_output("<para>", $paragraph, "</para>\n");
224c275
<         $parser->xml_output("<verbatim><![CDATA[\n", $paragraph, 
"\n]]></verbatim>\n");
---
 >         $parser->xml_output("<programlisting><![CDATA[\n", $paragraph, 
"\n]]></programlisting>\n");
243c294
<         $parser->xml_output($text, "</title>\n</head>\n");
---
 >         $parser->xml_output($text, "</title>\n");
247c298
<             $parser->xml_output("<sect1>\n<title>", $parser->{title}, 
"</title>\n");
---
 >             $parser->xml_output("<chapter>\n<title>", $parser->{title}, 
"</title>\n");
287c338
<         $new .= "\{tag:xlink uri='$url'\}$url\{/tag:xlink\}";
---
 >         $new .= "\{tag:ulink url='$url'\}$url\{/tag:ulink\}";
306c357
<         return "\{tag:strong\}$seq_argument\{\/tag:strong\}";
---
 >         return "\{tag:emphasis 
role='bold'\}$seq_argument\{/tag:emphasis\}";
326a378,379
 >       # Ignore any http:// URLs that are inside L<> they will be
 >       # processed later
333c386
<
---
 >
335,336c388,396
<
<         if ($seq_argument =~ /^(.*?)\/(.*)$/) {
---
 >         $text =~ s/\"//g;     # strip quotes
 >         $text =~ s/\&quot\;//g;
 >         $seq_argument =~ s/\"//g;
 >         $seq_argument =~ s/\&quot\;//g;
 >
 >       if ($seq_argument =~ /^.*$urls:.*/) {
 >           return "$text";
 >       }
 >         elsif ($seq_argument =~ /^(.*?)\/(.*)$/) {
338d397
<             my $ident_or_sect = $2;
340,355c399,400
<
<             if ($ident_or_sect =~ /^\"(.*)\"$/) {
<                 my $sect = $1;
<                 $sect = substr($sect, 0, 30);
<                 $sect =~ s/\s/_/g;
<                 $seq_argument .= '#' . $sect;
<             }
<             else {
<                 $seq_argument .= '#' . $ident_or_sect;
<             }
<         }
<         elsif ($seq_argument =~ /^\"(.*)\"$/) {
<             my $sect = $1;
<             $sect = substr($sect, 0, 30);
<             $sect =~ s/\s/_/g;
<             $seq_argument = '#' . $sect;
---
 >             my $ident_or_sect = sect_to_link($2);
 >             return "\{tag:olink targetdoc='$seq_argument' 
targetptr='$ident_or_sect'\}$text\{\/tag:olink\}";
357,358c402,404
<
<         return "\{tag:link xref='$seq_argument'\}$text\{\/tag:link\}";
---
 >
 >         $seq_argument = sect_to_link($seq_argument);
 >       return "\{tag:xref linkend='$seq_argument' \/\}";
386,391c432,433
< The XML format is not a standardised format - if you wish to generate
< some standard XML format such as docbook, please use a tool such as XSLT
< to convert between this and that format.
<
< The format uses the namespace "http://axkit.org/ns/2000/pod2xml";. Do not
< try and request this URI - it is virtual. You will get a 404.
---
 > The XML format is as close to DocBook as we can get from POD.
 > Some minor manual cleanup will be needed.
395,400c437,440
<   <pod xmlns="http://axkit.org/ns/2000/pod2xml";>
<     <head>
<       <title>The first =head1 goes in here</title>
<     </head>
<     <sect1>
<     <title>Subsequent =head1's create a sect1</title>
---
 >   <book>
 >   <title>The first =head1 goes in here</title>
 >     <chapter|appendix>
 >     <title>Subsequent =head1's create a chapter</title>
404,405c444,445
<       <verbatim><![CDATA[
<       Indented verbatim sections go in verbatim tags using a CDATA
---
 >       <programlisting><![CDATA[
 >       Indented verbatim sections go in programlisting tags using a CDATA
407,409c447,449
<       ]]></verbatim>
<       <sect2>
<       <title>=head2's go in sect2</title>
---
 >       ]]></programlisting>
 >       <sect1>
 >       <title>=head2's go in sect1</title>
412,413c452,453
<         supported by pod), producing sect3 and
<         sect4 respectively for =head3 and =head4.
---
 >         supported by pod), producing sect2 and
 >         sect3 respectively for =head3 and =head4.
416c456
<         Bold text goes in a <strong>strong</strong> tag.
---
 >         Bold text goes in a <note>note</note> tag.
425,427c465,467
<         Lists (=over, =item, =back) go in list/item/itemtext
<         tags. The itemtext element is only present if the
<         =item text is <strong>not</strong> the "*" character.
---
 >         Lists (=over, =item, =back) go in itemizedlist/listitem/para
 >         tags. The para element is only present if the
 >         =item text is <emphasis>not</emphasis> the "*" character.
429,431c469,471
<       </sect2>
<     </sect1>
<   </pod>
---
 >       </sect1>
 >     </chapter|appendix>
 >   </book>
443a484
 > Eric Lowe, elowe at sun.com

----- >8 snip -----

[docs-discuss] Developer Reference docbook conversion tool

Reply via email to