On 8/29/2025 12:03 PM, Pawel Urbanski wrote:
Hi Everyone,
I'm using Contxt to typeset my book. I chose it, for hte PDF tagging
support allowed me to revie things without sighted assistance.
Few months back, I noticed the following things, which can be interesting:
1. Reading PDFs in Chrome/Brave gives the best results with NVDA screen
reader , which I use.
2. Adobe Reader renders some characters that are likely different types
of spaching characters or glues some words without placing spaces in
between.
3. I had some issues navigating by headings, namely skipping headings at
osme lover levels, but maybe it is fixed now.
We had some discussions about how to proceed with tagging so here's a
bit of a summary.
- basically pdf tagging is conceptually crap and not getting any better
so that is something we need to live with; there is no way we can
influence it, we have to live with it (as we do with fonts, unicode,
whatever comes from outside the text world that doesnt' always suits wwll)
- it looks like the audience for tagging is kind of ignored; it has
become the domain of bad standards (that are no standards as they get
adapted; 'iso' has little meaning here), applications that have to
comply (and when it's hard the standard gets adapted), a small amount of
viewers (that behave differently), commerrcial and whatever interests
(leverage).
- validation and government demands don't help much and the former to
some extent even makes it worse because .. well .. gov demands it (of
course without much detail apart from it being / becoming requirements:
chickens and eggs)
From the point of view of context (!) we perfer to look at it from the
users point fo view becauseit's in the end the user that matters so we
have to make choices.
(1) For various reasons tagging the content stream has been part of pdf
already a long time. Basically one add structure to the page content
stream. Let's call this tagging level zero. We had quite some discussion
(and demonstration / testing) of how currently AI tooling deals with
content, including pdf and given what we observed the impression is that
a clean pdf file with such structural tags in the end are most reliable.
Apart from actually recognizing structure without tagging, clear
directives can be enough. It might be that within a decade all this (2)
and (3) mentioned next cna be ingored and ism obsolete, mayeb even
confusing and pointing in the wrong direction.
(2) When one starts adding some rolemapping, say around version 1.7
something that context supported for quite a while, additional
indicators to some (still undefined machinery) could be of help. That
version one mapping permits a sort of okay structure -> pdf/html (or
whatever one consider this mix) and math (which only serves a
sub-audience) basically is an text + mathml like tree / representation.
(3) Then came pdf 2 with ua2 so we call that version 2. The role mapping
actually became more limited and even keeps deteriorating as we speak.
This relates to validation. When the demand is 'it has to validate wrt
pdf's idea of structure' then one has to work around this and in the end
gets a pretty inconsistent mapping, especially when one talks of more
complex structure (beyond simple heads, a few lists and maybe tables).
It looks like tagging / validation has become an aim of its own (not
unsurprisingly because that's where money is made). There is also some
expectations wrt reflow but when that is intended, just go html (which
is why we spend time on the export and have picked up on that, more later.)
So, we have level zero structure and tagging in all cases, but in level
two we crappify it by rolemapping to suite the validation. However, we
can decide to have a middle way: we just map onto what pdf we thinks
makes most sense. That means that future AI driven backends can behave
well, and likely current less clever ones also can handle it as it is
unlikely that they do much crying over things: juist read the stuff and
play safe. An occasional pdf rolemap might help but as much looks
ignored anyway why bother. It means that we don't validate but are
therwise quite okay. Unfortunately the generic NonStruct became sort of
useless and the kind of exclusive inline/block modes lead to strange
mappings. One even has to rely on Artifact for non artifacts.
One reason for thinking that way is that we notice that the target
audience knows very well how to deal with matters. There is no need to
underestimate already available tools and usage patterns. We should
concentrate on content. Also, we can stimulate rendering two versions:
the intended as well as more accessible version with additional
(typeset) clues, maybe reordered, maybe improved access. So, we like to
know what is really needed, what would really help a visual impaired
reader. We don't care about other documents (let them go rolemapping,
validate and pretend being accessible) but about how to make our
documents accessible and useful. Does the reader care more about (say)
verapdf being happy (for a suboptimal rolemapped or crippled documents)
or does she/he wants a useful document.
Currently we can validate a context document with as reliable as
possible rolemapping (keep in mind: we start from rich structure, not
from limited and weird rolemaps)
> I'm very eager and happy to help with real-world and pragmattic
> experience. Unfortunately, I could not make it to hte conference in
> Poland, where I'm based, due to some minor health chaos.
That is indeed a pitty but health is more important. Anyway it's users
like you who should come up with the demands, not standards, committees,
companies and whatever. So just push us for solutions. As always with
(context) demands: it's users who drive it. And in the end it is the
document and content that matter to them. We rather spend time on that
than on something 'standard'.
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the
Wiki!
maillist : ntg-context@ntg.nl /
https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___________________________________________________________________________________