Implementing More Elaborate PDF Tagging

Vincent Hennebert Thu, 07 Jan 2010 04:40:13 -0800

Hi,

Some time ago basic PDF accessibility was implemented in FOP [1]. Part
of the job is to store the document’s logical structure into the PDF
output [2]. Basically, store the information “This content was in
a block”, “that content comes from a table-cell”, etc.


PDF defines a set of standard structure elements and FOP implements
a default mapping of FOs to those structure elements. For example,
fo:root is mapped to Document, fo:block to P (Paragraph), fo:table to
Table, etc.

There is a need to do more fine-grain mapping, and be able to tag
certain fo:block as headers (H1 to H6) instead of simply Paragraphs.
That way the structure of the source document would be more accurately
represented in the PDF.

The role property [3] has been defined pretty much for that purpose;
Its value should be the name of the element from which the FO comes, or
if it’s not enough the URI of an RDF resource describing some structure
type.

Nothing is enforced, though (‘should’, not ‘must’) and I think we can
get away with directly putting a PDF standard structure type (Document,
Part, P, H1, Table, etc.). If a non-standard type is specified, we would
fall back to the default mapping and a warning could possibly be issued.

Since PDF is the only output format that support logical structure at
the moment, that should be enough for now.

I’m going to implement this enhancement in the next few days. Any
comments or suggestions are welcome.

[1] http://markmail.org/thread/mjskmien2ha6agzb
[2] http://wiki.apache.org/xmlgraphics-fop/LogicalStructure
[3] http://www.w3.org/TR/xsl11/#role


Thanks,
Vincent

Implementing More Elaborate PDF Tagging

Reply via email to