Hello, perhaps it's an option for you to try the HTML to DocBook converter herold (http://www.dbdoclet.org/archives/herold_5.2.2.jar). I ran java -jar herold_5.2.2.jar -i input.xhtml -o output.xml and the result looks like:
<?xml version="1.0" encoding="UTF-8"?> <article version="1.0" xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink"> <info/> <section remap="h1"> <title>Title1</title> <para>bla 1</para> <section remap="h2"> <title>Title2</title> <para>bla 2</para> </section> </section> </article> Regards > Hi there, > > I'd like to know if anyone is using the script from the page: > http://wiki.docbook.org/topic/Html2DocBook > > I tried on a very tidy example: > > $ cat input.xhtml > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> > <html xmlns="http://www.w3.org/1999/xhtml"> > <body> > <h1>Title1</h1> > <p>bla 1</p> > <h2>Title2</h2> > <p>bla 2</p> > </body> > </html> > > > Here is what I get as output: > > $ cat output.xml > <?xml version="1.0"?> > <section> > <title>Title1</title> > <para>bla 1</para> > <para>bla 2</para> > </section> > > The title in <h2> element is lost during the conversion. > > Any idea on how to fix that ? > > Thanks,
