Hi all, here is my experience (and a few strong opinions) about single
sourcing from docbook to pdf/html/epub/docx toolchains.

In the past I setup, used and maintained various docbook/FO toolchain
variations to localize for the italian market and to publish for print
several books from the O'Reilly catalog starting from DB sources. While the
docbook editing part worked like a charm using Oxygen, the pdf generation
was *very* painful to setup and to maintain. At the end it worked, but it
is not something I would recommend for anything but a very simple
book/layout. We started using FOP as the pagination engine, but we found
that it is not suitable for professional print production of complex books
(tables, figures, boxes, sidebars, etc). At the end we used the
AntennaHouse engine with its proprietary extensions. Note that today even
O'Reilly has abandoned the FO ruote to produce pdf from DB.

My conclusion is that while DB is *very* good and well supported as a
structuring and archiving format, FOP and friends are not a suitable
solution for producing professional PDF. Moreover, FOP is on a dead end, as
it is being replaced by css pagination (see AntennaHouse and Prince product
lines).

I then started using a DB/latex/pdf toolchain, but I usually find this
solution not flexible enough (having to edit some code just to move a
figure is not something that scales up that well) and I think that the
batch pagination paradigm used by tex/fop is not suitable for complex books.

I now routinely use with great satisfaction and efficiency a workflow based
on transforming via xslt pipelines from docbook to idml (the xml format
used by Adobe indesign) and then producing typographically perfect PDF
interactively from the automatically generated indesign files.

I initially developed myself the xslt pipelines for trasforming to
indesign, but a few months ago I discovered this nugget:

http://www.le-tex.de/en/transpect.html

It is a game-changer library/framework made by a brilliant German software
house to *roundtrip* from/to any word/indesign/xml using as a pivot a
format named hubxml, which is simplified docbook + css attributes.
Everything has  been open sourced.

As an example, these are the out of the box possibilities:
docx/idml to hubxml;
hubxml to html/idml/epub/docx;
interactive proofreading/copyfitting/image refining/pdf prodiction directly
in indesign; export from indesign to idml and then conversion back to
hubxml for archiving (i.e., true roundtripping)

For going from xml to indesign (idml) all you need is an indesign template
with the layout and the typography (note that the indesign template could
be created and maintained by a graphic designer who knows absolutely
nothing about tags or xml/html/Idml) and a mapping xml configuration file.
Tables, images and math formulae are supported almost out of the box.

The technology used is standard xslt/xproc/xsd/schematron. There is even a
terrific module to check xml files (generated from word processing) against
business rules expressed in schematron and then annotate an html version of
the sources with warning messages (i.e. to check that only the styles from
a controlled vocabulary of styles have been used). The runtime environment
is java (saxon and calabash). The software is very robust and very well
designed and written. See the above link for all the details.

I have now extensive experience with this library/framework and I use it
already in production for a couple of clients. I am standardizing
everything on this.

I'll soon have a hosted web environment in public beta. If someone is
intetested, please drop me a private message.

Kind regards,
__peppo

Reply via email to