Hi,
I have successfully hacked together a tool which converts the Developer
Reference POD source into DocBook XML. The resulting XML->HTML output is
very similar to the stock POD->HTML output.
I started with the Pod::XML v0.93 perl module from CPAN and its included
pod2xml tool. I had to hack up the perl module considerably so that it
would correctly translate links, since the v0.93 version has half-baked
link handling. I also modified the script so that it would produce DocBook
tags instead of the funny "pod" format to avoid XSLT post-processing,
added some code to strip the chapter numbers, and added recognition for
appendices.
I had to make a few minor changes to the POD source (I'm working from
v1.10) to get it to translate correctly. There are also a few broken
reference links which I haven't tried to address, but which xsltproc flags.
26c26
< B<OpenSolaris Developer's Reference>
---
> =head1 OpenSolaris Developer's Reference
829c829
< =head4 Example: Installing Studio 10 compilers, installed image
---
> =head3 Example 1: Installing Studio 10 compilers, installed image
857c857
< =head4 Example: Installing Studio 10 compilers, full product
---
> =head3 Example 2: Installing Studio 10 compilers, full product
907c907
< =head4 Example: Installing the ON build tools
---
> =head3 Example 3: Installing the ON build tools
The diffs of my XML.pm versus the stock XML.pm are below. E-mail me
privately if you want a copy of the intact XML.pm file.
Cheers
- Eric
----- >8 snip -----
1a2
> # Modified by elowe at sun.com 2006-05-22 to generate DocBook XML
5c6
< use vars qw(@ISA $VERSION %head2sect %xmlchars %HTML_Escapes);
---
> use vars qw(@ISA $VERSION %head2sect %head2altsect %xmlchars
%HTML_Escapes);
10c11,13
< $VERSION = '0.93';
---
> $VERSION = '0.94';
>
> my $inappendix = 0;
13,16c16,26
< 1 => "sect1",
< 2 => "sect2",
< 3 => "sect3",
< 4 => "sect4",
---
> 1 => "chapter",
> 2 => "sect1",
> 3 => "sect2",
> 4 => "sect3",
> );
>
> %head2altsect = (
> 1 => "appendix",
> 2 => "sect1",
> 3 => "sect2",
> 4 => "sect3",
115a126,138
> sub strip_numbering {
> my ($yadda) = @_;
> $yadda =~ /[\d\. ]*(.*)/;
> return $1;
> }
>
> sub sect_to_link {
> my ($title) = @_;
> my $esctitle = "_" . substr($title, 0, 30);
> $esctitle =~ s/\W/_/g;
> return $esctitle;
> }
>
126c149,150
< <pod xmlns="http://axkit.org/ns/2000/pod2xml">
---
> <!DOCTYPE book SYSTEM "docbookx.dtd">
> <book>
135c159,163
< $parser->xml_output("</$head2sect{$parser->{headlevel}}>\n");
---
> if ($inappendix == 1) {
>
$parser->xml_output("</$head2altsect{$parser->{headlevel}}>\n");> }
else {
> $parser->xml_output("</$head2sect{$parser->{headlevel}}>\n");
> }
140c168
< </pod>
---
> </book>
160c188
< $parser->xml_output("<head>\n\t<title>");
---
> $parser->xml_output("<title>");
165c193
< $parser->xml_output($paragraph, "</title>\n</head>\n");
---
> $parser->xml_output($paragraph, "</title>\n");
171c199,203
<
$parser->xml_output("</$head2sect{$parser->{headlevel}}>\n");
---
> if ($inappendix == 1) {
>
$parser->xml_output("</$head2altsect{$parser->{headlevel}}>\n");
> } else {
>
$parser->xml_output("</$head2sect{$parser->{headlevel}}>\n");
> }
178c210,214
< $parser->xml_output("<$head2sect{$parser->{headlevel}}>\n");
---
> if ($inappendix == 1) {
>
$parser->xml_output("<$head2altsect{$parser->{headlevel}}>\n");
> } else {
>
$parser->xml_output("<$head2sect{$parser->{headlevel}}>\n");
> }
182,183c218,234
< $parser->xml_output("<$head2sect{$headlevel}>\n",
< "<title>", $paragraph, "</title>\n");
---
> my $title = $paragraph;
> if ($headlevel == 1) { # DocBook does its own chapter
numbering> $title = strip_numbering($title);
> $inappendix = 0;
> if ($title =~ /Appendix/) {
> $inappendix = 1;
> $title =~ s/Appendix [A-Z|a-z]+[:.]?\s*//;
> }
> }
> my $esctitle = sect_to_link($paragraph);
>
> my $tag = $head2sect{$headlevel};
> if ($inappendix == 1) {
> $tag = $head2altsect{$headlevel};
> }
> $parser->xml_output("<$tag id='$esctitle' xreflabel='$title'>\n",
> "<title>", $title, "</title>\n");
187c238
< $parser->xml_output("</item>\n");
---
> $parser->xml_output("</listitem>\n");
190c241
< $parser->xml_output("<list>\n");
---
> $parser->xml_output("<itemizedlist>\n");
194c245
< $parser->xml_output("</item>\n");
---
> $parser->xml_output("</listitem>\n");
197c248
< $parser->xml_output("</list>\n");
---
> $parser->xml_output("</itemizedlist>\n");
201c252
< $parser->xml_output("</item>\n");
---
> $parser->xml_output("</listitem>\n");
204c255
< $parser->xml_output("<item>");
---
> $parser->xml_output("<listitem>");
207c258
< $parser->xml_output("<itemtext>", $paragraph, "</itemtext>\n");
---
> $parser->xml_output("<para>", $paragraph, "</para>\n");
224c275
< $parser->xml_output("<verbatim><![CDATA[\n", $paragraph,
"\n]]></verbatim>\n");
---
> $parser->xml_output("<programlisting><![CDATA[\n", $paragraph,
"\n]]></programlisting>\n");
243c294
< $parser->xml_output($text, "</title>\n</head>\n");
---
> $parser->xml_output($text, "</title>\n");
247c298
< $parser->xml_output("<sect1>\n<title>", $parser->{title},
"</title>\n");
---
> $parser->xml_output("<chapter>\n<title>", $parser->{title},
"</title>\n");
287c338
< $new .= "\{tag:xlink uri='$url'\}$url\{/tag:xlink\}";
---
> $new .= "\{tag:ulink url='$url'\}$url\{/tag:ulink\}";
306c357
< return "\{tag:strong\}$seq_argument\{\/tag:strong\}";
---
> return "\{tag:emphasis
role='bold'\}$seq_argument\{/tag:emphasis\}";
326a378,379
> # Ignore any http:// URLs that are inside L<> they will be
> # processed later
333c386
<
---
>
335,336c388,396
<
< if ($seq_argument =~ /^(.*?)\/(.*)$/) {
---
> $text =~ s/\"//g; # strip quotes
> $text =~ s/\"\;//g;
> $seq_argument =~ s/\"//g;
> $seq_argument =~ s/\"\;//g;
>
> if ($seq_argument =~ /^.*$urls:.*/) {
> return "$text";
> }
> elsif ($seq_argument =~ /^(.*?)\/(.*)$/) {
338d397
< my $ident_or_sect = $2;
340,355c399,400
<
< if ($ident_or_sect =~ /^\"(.*)\"$/) {
< my $sect = $1;
< $sect = substr($sect, 0, 30);
< $sect =~ s/\s/_/g;
< $seq_argument .= '#' . $sect;
< }
< else {
< $seq_argument .= '#' . $ident_or_sect;
< }
< }
< elsif ($seq_argument =~ /^\"(.*)\"$/) {
< my $sect = $1;
< $sect = substr($sect, 0, 30);
< $sect =~ s/\s/_/g;
< $seq_argument = '#' . $sect;
---
> my $ident_or_sect = sect_to_link($2);
> return "\{tag:olink targetdoc='$seq_argument'
targetptr='$ident_or_sect'\}$text\{\/tag:olink\}";
357,358c402,404
<
< return "\{tag:link xref='$seq_argument'\}$text\{\/tag:link\}";
---
>
> $seq_argument = sect_to_link($seq_argument);
> return "\{tag:xref linkend='$seq_argument' \/\}";
386,391c432,433
< The XML format is not a standardised format - if you wish to generate
< some standard XML format such as docbook, please use a tool such as XSLT
< to convert between this and that format.
<
< The format uses the namespace "http://axkit.org/ns/2000/pod2xml". Do not
< try and request this URI - it is virtual. You will get a 404.
---
> The XML format is as close to DocBook as we can get from POD.
> Some minor manual cleanup will be needed.
395,400c437,440
< <pod xmlns="http://axkit.org/ns/2000/pod2xml">
< <head>
< <title>The first =head1 goes in here</title>
< </head>
< <sect1>
< <title>Subsequent =head1's create a sect1</title>
---
> <book>
> <title>The first =head1 goes in here</title>
> <chapter|appendix>
> <title>Subsequent =head1's create a chapter</title>
404,405c444,445
< <verbatim><![CDATA[
< Indented verbatim sections go in verbatim tags using a CDATA
---
> <programlisting><![CDATA[
> Indented verbatim sections go in programlisting tags using a CDATA
407,409c447,449
< ]]></verbatim>
< <sect2>
< <title>=head2's go in sect2</title>
---
> ]]></programlisting>
> <sect1>
> <title>=head2's go in sect1</title>
412,413c452,453
< supported by pod), producing sect3 and
< sect4 respectively for =head3 and =head4.
---
> supported by pod), producing sect2 and
> sect3 respectively for =head3 and =head4.
416c456
< Bold text goes in a <strong>strong</strong> tag.
---
> Bold text goes in a <note>note</note> tag.
425,427c465,467
< Lists (=over, =item, =back) go in list/item/itemtext
< tags. The itemtext element is only present if the
< =item text is <strong>not</strong> the "*" character.
---
> Lists (=over, =item, =back) go in itemizedlist/listitem/para
> tags. The para element is only present if the
> =item text is <emphasis>not</emphasis> the "*" character.
429,431c469,471
< </sect2>
< </sect1>
< </pod>
---
> </sect1>
> </chapter|appendix>
> </book>
443a484
> Eric Lowe, elowe at sun.com
----- >8 snip -----