Flying Saucer (http://xhtmlrenderer.dev.java.net/) includes code to render XHTML documents to PDF, using CSS stylesheets to precisely control the document's appearance. For me, this is a big improvement over either a manual-layout word processor or TeX.
This is a quick Jython script I wrote to integrate Flying Saucer with some fonts I wanted to use in my document, and HTML Tidy, so that I didn't have to write the document in XHTML directly. I'm using Flying Saucer R8pre2 with it, haven't yet tried the final R8 release, but I imagine it's compatible. #!/usr/bin/jython # -*- coding: utf-8 -*- """Render an HTML file to PDF using Flying Saucer. This program has the following advantages over `org.xhtmlrenderer.simple.PDFRenderer`, the example that comes with Flying Saucer: - it is written in Jython rather than Java, so it should be simpler to read, understand, and reuse; - courtesy of shelling out to HTML Tidy, it takes HTML input rather than XHTML; - it imports some fonts from `fontdir`. You still have to set `CLASSPATH` to include Flying Saucer and IText though. I invoke it from the `xhtmlrenderer` (Flying Saucer) top-level directory as follows: time CLASSPATH='build/classes:lib/itext-paulo-155.jar' \ wherever/pdfwithfonts.py input.html > output.pdf Or from somewhere with the jar files: time CLASSPATH=core-renderer.jar:itext-paulo-155.jar \ ./pdfwithfonts.py input.html > output.pdf One weird problem is that it doesn’t automatically make a PDF index using the headers of the HTML file. Flying Saucer expects some invalid HTML like this in `<head>` to tell it what you want to put in your PDF index: <bookmarks> <bookmark name='1. Foo bar baz' href='#1'> <bookmark name='1.1 Baz quux' href='#1.2'> </bookmark> </bookmark> <bookmark name='2. Foo bar baz' href='#2'> <bookmark name='2.1 Baz quux' href='#2.2'> </bookmark> </bookmark> </bookmarks> I may look into building these bookmarks automatically. I’d also like to be able to compile this program with `jythonc` so I can run it without Jython installed, but I can’t figure out how. """ import org.xhtmlrenderer.pdf, com.lowagie.text.pdf, java.io, os, sys # I’m developing this with Jython 2.1. try: True except NameError: True, False = 1, 0 # BROKEN AND NOT USED; see comment. def add_font_directory(resolver, dirname, embedded=True): """Add a directory of .afm and corresponding .pfb files. This doesn’t work, because the ITextFontResolver isn’t discriminating enough, so if you add all of these fonts, your Nimbus Sans L bold text ends up as Nimbus Sans L 'italic' (really oblique), your Nimbus Sans L regular text ends up as Nimbus Sans L condensed, your URW Palladio L regular text ends up as URW Palladio L bold, and your URW Palladio L bold text ends up as URW Palladio L italic (which is really quite a nice font, but not what you asked for). If you just add the four files containing the specific fonts you need, things work better. See `addFonts` for that. """ encoding = com.lowagie.text.pdf.BaseFont.CP1252 # Is this right? for fileobj in java.io.File(dirname).listFiles(): if fileobj.name.lower().endswith('.afm'): path = fileobj.absolutePath resolver.addFont(path, encoding, embedded, path[:-4] + '.pfb') class TidyFailed(Exception): pass def tidy_file(infile, outfile): "Invoke HTML Tidy to generate XHTML. Not suitable for malicious input." tidy_rv = os.system('tidy -utf8 -asxhtml "%s" > "%s"' % (infile, outfile)) success = 0 tidy_warnings = 1 # see tidy(1) man page, section "EXIT STATUS" if tidy_rv not in [success, tidy_warnings]: raise TidyFailed(tidy_rv) fontdir = "/usr/share/fonts/type1/gsfonts/" def add_fonts(font_resolver): """Imports a couple of specific fonts I like to use from `fontdir`. Although the documentation for Flying Saucer says that only TrueType fonts can be added to the font resolver, it actually supports `.afm` files with associated `.pfb` files as well. """ # This version doesn’t work (see comments on add_font_directory): # add_font_directory(font_resolver, fontdir) encoding = com.lowagie.text.pdf.BaseFont.CP1252 # is this right? fontnames = ["n019003l", # Nimbus Sans L regular "n019004l", # Nimbus Sans L bold "p052003l", # URW Palladio L regular "p052004l", # URW Palladio L bold ] for fontname in fontnames: font = fontdir + fontname # True here is embedded=True, i.e. embed the font in the .pdf font_resolver.addFont(font + ".afm", encoding, True, font + ".pfb") def main(infile, outfile): """Render HTML from filename `infile` to a PDF on the file obj `outfile`. Interpolates `infile` into a command line and then writes a temporary file `tmp.xhtml` in the current directory, so don’t use it where an attacker could supply the input filename or control the current working directory. """ tmpname = 'tmp.xhtml' tidy_file(infile, tmpname) r = org.xhtmlrenderer.pdf.ITextRenderer() add_fonts(r.fontResolver) # XXX can’t just `r.document =` because that calls setDocument(String) r.setDocument(java.io.File(tmpname)) r.layout() r.createPDF(outfile) if __name__ == '__main__': main(sys.argv[1], sys.stdout) -- To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-hacks