Hi Holger,
No, you didn't miss anything. The DocBook XSL stylesheets currently do
not provide support for tagged PDFs.
That said, my short investigation shows that implementing such support
is nontrivial. Keep in mind that the DocBook stylesheets don't actually
create a PDF. The stylesheets generate a FO version of the document,
and then an XSL-FO processor converts that to a PDF. So DocBook XSL has
to generate additional markup in the FO output that an XSL-FO processor
can convert to PDF tags.
It looks like each of the XSL-FO processors commonly used with DocBook
(FOP, XEP, and Antenna House) have different extensions for implementing
the FO needed to generate the PDF accessibility tags. For example:
FOP has fox:alt-text
XEP has rx:pdf-structure-tag
AH expects axf:pdftag
This situation is similar to when PDF bookmarks were first implemented.
Each XSL-FO processor had their own extensions to implement that
feature, and DocBook XSL had to support all three extensions. When XSL
1.1 standardized the markup for bookmarks, then all the XSL-FO
processors eventually implemented that standard and so did DocBook XSL.
For accessibility, XSL 1.1 says suggests outputting a role attribute
with this content:
"To aid alternate renderers, the <string> value should be the qualified
name (QName [XML Names] or [XML Names 1.1]) of the element from which
this formatting object is constructed. If a QName does not
provide sufficient context, the <uri-specification> can be used to
identify an RDF resource that describes
the role in more detail. This RDF resource may be embedded in the result
tree and referenced with a relative
URI or fragment identifier, or the RDF resource may be external to the
result tree. This specification
does not define any standard QName or RDF vocabularies; these are
frequently application area dependent.
Other groups, for example the Dublin Core, have defined such vocabularies."
If we used "name of the element from which this formatting object is
constructed", that would be DocBook element names, which would not be
recognized by any of the XSL-FO processors. Providing an RDF description
of the mapping of such element names would also not be recognized by the
XSL-FO processors, as far as I can tell.
You suggested that FOP expects HTML element names in the role attribute,
but I wonder if that is the case with the other XSL-FO processors?
I would be interested in adding PDF tagging to DocBook XSL. It would
help if there were a clear spec for how to do so. If I have to figure
it out for each of three XSL-FO processors, that's going to take some time.
Bob Stayton
Sagehill Enterprises
[email protected]
On 4/3/2017 6:25 AM, Holger Bast wrote:
Dear all,
I'm trying to generate "tagged" and "accessible" PDF documents, via docbook5 ->
xsfl:fo -> pdf (with Apache FOP and docbook-xsl v1.79.1). I tried both FOP-parameters
(accessibility/PDF-UA) and received a tagged PDF file. But I found out that there is no structural
information inside the pdf; 'everything' is tagged as p(aragraph). The xsl:fo also lacks this kind of
information. The Apache FOP Accessibility help recommends using the role attribute for tagging
information inside the document:
<fo:block role="H1" font-weight="bold">I. A Level 1 Heading</fo:block>
Did I miss something (parameter, wrong stylesheet)?
Has anyone already generated accessible PDF documents based on DocBook?
Thanks, Holger
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]