Flying Saucer ( includes code to
render XHTML documents to PDF, using CSS stylesheets to precisely
control the document's appearance. For me, this is a big improvement
over either a manual-layout word processor or TeX.

This is a quick Jython script I wrote to integrate Flying Saucer with
some fonts I wanted to use in my document, and HTML Tidy, so that I
didn't have to write the document in XHTML directly.

I'm using Flying Saucer R8pre2 with it, haven't yet tried the final R8
release, but I imagine it's compatible.

# -*- coding: utf-8 -*-
"""Render an HTML file to PDF using Flying Saucer.

This program has the following advantages over
`org.xhtmlrenderer.simple.PDFRenderer`, the example that comes with
Flying Saucer:

- it is written in Jython rather than Java, so it should be simpler to
  read, understand, and reuse;
- courtesy of shelling out to HTML Tidy, it takes HTML input rather
  than XHTML;
- it imports some fonts from `fontdir`.

You still have to set `CLASSPATH` to include Flying Saucer and IText
though.  I invoke it from the `xhtmlrenderer` (Flying Saucer)
top-level directory as follows:

    time CLASSPATH='build/classes:lib/itext-paulo-155.jar' \
        wherever/ input.html > output.pdf

Or from somewhere with the jar files:

    time CLASSPATH=core-renderer.jar:itext-paulo-155.jar \
        ./ input.html > output.pdf

One weird problem is that it doesn’t automatically make a PDF index
using the headers of the HTML file.  Flying Saucer expects some
invalid HTML like this in `<head>` to tell it what you want to put in
your PDF index:

        <bookmark name='1. Foo bar baz' href='#1'>
          <bookmark name='1.1 Baz quux' href='#1.2'>
        <bookmark name='2. Foo bar baz' href='#2'>
          <bookmark name='2.1 Baz quux' href='#2.2'>

I may look into building these bookmarks automatically.

I’d also like to be able to compile this program with `jythonc` so I
can run it without Jython installed, but I can’t figure out how.


import org.xhtmlrenderer.pdf, com.lowagie.text.pdf,, os, sys

# I’m developing this with Jython 2.1.
except NameError:
    True, False = 1, 0

# BROKEN AND NOT USED; see comment.
def add_font_directory(resolver, dirname, embedded=True):
    """Add a directory of .afm and corresponding .pfb files.

    This doesn’t work, because the ITextFontResolver isn’t
    discriminating enough, so if you add all of these fonts, your
    Nimbus Sans L bold text ends up as Nimbus Sans L 'italic' (really
    oblique), your Nimbus Sans L regular text ends up as Nimbus Sans L
    condensed, your URW Palladio L regular text ends up as URW
    Palladio L bold, and your URW Palladio L bold text ends up as URW
    Palladio L italic (which is really quite a nice font, but not what
    you asked for).

    If you just add the four files containing the specific fonts you
    need, things work better.  See `addFonts` for that.


    encoding = com.lowagie.text.pdf.BaseFont.CP1252 # Is this right?

    for fileobj in
            path = fileobj.absolutePath
            resolver.addFont(path, encoding, embedded, path[:-4] + '.pfb')

class TidyFailed(Exception): pass

def tidy_file(infile, outfile):
    "Invoke HTML Tidy to generate XHTML.  Not suitable for malicious input."

    tidy_rv = os.system('tidy -utf8 -asxhtml "%s" > "%s"' % (infile, outfile))

    success       = 0
    tidy_warnings = 1    # see tidy(1) man page, section "EXIT STATUS"

    if tidy_rv not in [success, tidy_warnings]:
        raise TidyFailed(tidy_rv)

fontdir = "/usr/share/fonts/type1/gsfonts/"

def add_fonts(font_resolver):
    """Imports a couple of specific fonts I like to use from `fontdir`.

    Although the documentation for Flying Saucer says that only
    TrueType fonts can be added to the font resolver, it actually
    supports `.afm` files with associated `.pfb` files as well.

    # This version doesn’t work (see comments on add_font_directory):
    # add_font_directory(font_resolver, fontdir)

    encoding = com.lowagie.text.pdf.BaseFont.CP1252 # is this right?

    fontnames = ["n019003l",        # Nimbus Sans L regular
                 "n019004l",        # Nimbus Sans L bold
                 "p052003l",        # URW Palladio L regular
                 "p052004l",        # URW Palladio L bold

    for fontname in fontnames:
        font = fontdir + fontname

        # True here is embedded=True, i.e. embed the font in the .pdf
        font_resolver.addFont(font + ".afm", encoding, True, font + ".pfb")

def main(infile, outfile):
    """Render HTML from filename `infile` to a PDF on the file obj `outfile`.

    Interpolates `infile` into a command line and then writes a
    temporary file `tmp.xhtml` in the current directory, so don’t use
    it where an attacker could supply the input filename or control
    the current working directory.

    tmpname = 'tmp.xhtml'
    tidy_file(infile, tmpname)

    r = org.xhtmlrenderer.pdf.ITextRenderer()

    # XXX can’t just `r.document =` because that calls setDocument(String)

if __name__ == '__main__':
    main(sys.argv[1], sys.stdout)
To unsubscribe:

Reply via email to