Hi Alex

On Saturday, February 4, 2012 1:08:41 AM UTC, Alex Buckley wrote:
>
> DocBook is good for "real" books where you want to/have to produce a 
> PDF, thanks to the DocBook-XSL stylesheet package. How do you produce 
> PDF from HTML5 source?


I have been experimenting with using DocBook / DITA and XML-safe HTML5 and 
my requirements are:

- Produce PDF output with both clickable TOC pages with outline/bookmarks
- Produce vanilla HTML5 output files that can be assembled by some very 
basic PHP scripts
- Produce JSON TOC that can be used by the very basic PHP scripts for 
context sensitive sidebar

My initial thoughts were to use DocBook or a custom XML schema. On the plus 
side this would mean that I can take advantage of DocBook's powerful 
feature set, but on the minus side the schema is quite a learning curve 
given limited time constraints plus WYSIWYG editing seems like a no go. 
Certainly the XML tools that I tried have terrible WYSIWYG for XML formats.

My next thought was to take advantage of the new semantic elements of HTML5 
and make use of classes and data-* attributes when additional semantics are 
useful. I had a number of concerns with this approach including those 
mentioned in this thread (inconsistent WYSIWYG experience with excess 
browser generated junk; plus inability to produce quality PDF file). The 
only CSS3 processor that I can find that fully supports paged media types 
(Prince) is way too expensive.

After a LOT of research here are my findings:

- wkhtmltopdf is absolutely fantastic at converting HTML5 to PDF. Whilst 
its support for paged media types is limited to the offerings of WebKit, 
the command line interface allows custom cover page(s) to be added, an 
automatically generated TOC (using HTML5 outline), ability to specify 
custom header and footers using custom HTML files (with JavaScript access 
to wkhtmltopdf properties). Plus all links (and TOC) are clickable and the 
PDF outline is generated beautifully.

*PDF Output: *Tick

- WYSIWYG support that is consistent across browsers with clean HTML5 
compliance is possible with thanks to the Aloha Editor: 
http://aloha-editor.org/. With the addition of a very simple "static" 
content management system the process of creating, managing and editing 
static pages is very easy.

*WYSIWYG: *Tick

- Generation of JSON TOC to support navigation on PHP powered website is 
easily generated with custom XSLT2 stylesheets by first concatenating the 
contents of all HTML pages in order, and then scanning the H1-6 tags 
(whilst respecting HTML5 section/article/aside/etc)

*Easy to use website: *Tick

For me the final part of the puzzle has been bringing all of these things 
together in a way that is easy to manage. I am considering using 
*chromiumembedded 
*within a C# Forms application using a simple embedded HTTP server to glue 
all of the above together. Note: I am not writing the CMS using PHP, but 
rather using C# for easier access to Saxon (XSLT2 processing).

After serious consideration this seems to be the easiest approach overall 
(whilst a little extra initial preparation is required). Though at this 
stage I am not committed to this approach, I am still in the experimental 
phase really. I am looking for something with flexibility over visual 
styles (which DocBook seems to lack), whilst maintaining good semantics, 
whilst having both HTML and PDF output that are both consistent in style 
and easy to use. And hopefully far easier to edit using WYSIWYG. Whilst I 
do not mind manually typing XML elements around my text when writing XML 
comments, I can see this becoming very tedious when writing large amounts 
of technical documentation.

-- 
You received this message because you are subscribed to the Google Groups "Java 
Posse" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/javaposse/-/mkbprtHhrl4J.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/javaposse?hl=en.

Reply via email to