Flying Saucer (http://xhtmlrenderer.dev.java.net/) includes code to
render XHTML documents to PDF, using CSS stylesheets to precisely
control the document's appearance. For me, this is a big improvement
over either a manual-layout word processor or TeX.
This is a quick Jython script I wrote to integrate Flying Saucer with
some fonts I wanted to use in my document, and HTML Tidy, so that I
didn't have to write the document in XHTML directly.
I'm using Flying Saucer R8pre2 with it, haven't yet tried the final R8
release, but I imagine it's compatible.
#!/usr/bin/jython
# -*- coding: utf-8 -*-
"""Render an HTML file to PDF using Flying Saucer.
This program has the following advantages over
`org.xhtmlrenderer.simple.PDFRenderer`, the example that comes with
Flying Saucer:
- it is written in Jython rather than Java, so it should be simpler to
read, understand, and reuse;
- courtesy of shelling out to HTML Tidy, it takes HTML input rather
than XHTML;
- it imports some fonts from `fontdir`.
You still have to set `CLASSPATH` to include Flying Saucer and IText
though. I invoke it from the `xhtmlrenderer` (Flying Saucer)
top-level directory as follows:
time CLASSPATH='build/classes:lib/itext-paulo-155.jar' \
wherever/pdfwithfonts.py input.html > output.pdf
Or from somewhere with the jar files:
time CLASSPATH=core-renderer.jar:itext-paulo-155.jar \
./pdfwithfonts.py input.html > output.pdf
One weird problem is that it doesn’t automatically make a PDF index
using the headers of the HTML file. Flying Saucer expects some
invalid HTML like this in `<head>` to tell it what you want to put in
your PDF index:
<bookmarks>
<bookmark name='1. Foo bar baz' href='#1'>
<bookmark name='1.1 Baz quux' href='#1.2'>
</bookmark>
</bookmark>
<bookmark name='2. Foo bar baz' href='#2'>
<bookmark name='2.1 Baz quux' href='#2.2'>
</bookmark>
</bookmark>
</bookmarks>
I may look into building these bookmarks automatically.
I’d also like to be able to compile this program with `jythonc` so I
can run it without Jython installed, but I can’t figure out how.
"""
import org.xhtmlrenderer.pdf, com.lowagie.text.pdf, java.io, os, sys
# I’m developing this with Jython 2.1.
try:
True
except NameError:
True, False = 1, 0
# BROKEN AND NOT USED; see comment.
def add_font_directory(resolver, dirname, embedded=True):
"""Add a directory of .afm and corresponding .pfb files.
This doesn’t work, because the ITextFontResolver isn’t
discriminating enough, so if you add all of these fonts, your
Nimbus Sans L bold text ends up as Nimbus Sans L 'italic' (really
oblique), your Nimbus Sans L regular text ends up as Nimbus Sans L
condensed, your URW Palladio L regular text ends up as URW
Palladio L bold, and your URW Palladio L bold text ends up as URW
Palladio L italic (which is really quite a nice font, but not what
you asked for).
If you just add the four files containing the specific fonts you
need, things work better. See `addFonts` for that.
"""
encoding = com.lowagie.text.pdf.BaseFont.CP1252 # Is this right?
for fileobj in java.io.File(dirname).listFiles():
if fileobj.name.lower().endswith('.afm'):
path = fileobj.absolutePath
resolver.addFont(path, encoding, embedded, path[:-4] + '.pfb')
class TidyFailed(Exception): pass
def tidy_file(infile, outfile):
"Invoke HTML Tidy to generate XHTML. Not suitable for malicious input."
tidy_rv = os.system('tidy -utf8 -asxhtml "%s" > "%s"' % (infile, outfile))
success = 0
tidy_warnings = 1 # see tidy(1) man page, section "EXIT STATUS"
if tidy_rv not in [success, tidy_warnings]:
raise TidyFailed(tidy_rv)
fontdir = "/usr/share/fonts/type1/gsfonts/"
def add_fonts(font_resolver):
"""Imports a couple of specific fonts I like to use from `fontdir`.
Although the documentation for Flying Saucer says that only
TrueType fonts can be added to the font resolver, it actually
supports `.afm` files with associated `.pfb` files as well.
"""
# This version doesn’t work (see comments on add_font_directory):
# add_font_directory(font_resolver, fontdir)
encoding = com.lowagie.text.pdf.BaseFont.CP1252 # is this right?
fontnames = ["n019003l", # Nimbus Sans L regular
"n019004l", # Nimbus Sans L bold
"p052003l", # URW Palladio L regular
"p052004l", # URW Palladio L bold
]
for fontname in fontnames:
font = fontdir + fontname
# True here is embedded=True, i.e. embed the font in the .pdf
font_resolver.addFont(font + ".afm", encoding, True, font + ".pfb")
def main(infile, outfile):
"""Render HTML from filename `infile` to a PDF on the file obj `outfile`.
Interpolates `infile` into a command line and then writes a
temporary file `tmp.xhtml` in the current directory, so don’t use
it where an attacker could supply the input filename or control
the current working directory.
"""
tmpname = 'tmp.xhtml'
tidy_file(infile, tmpname)
r = org.xhtmlrenderer.pdf.ITextRenderer()
add_fonts(r.fontResolver)
# XXX can’t just `r.document =` because that calls setDocument(String)
r.setDocument(java.io.File(tmpname))
r.layout()
r.createPDF(outfile)
if __name__ == '__main__':
main(sys.argv[1], sys.stdout)
--
To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-hacks