On 8/29/2025 12:03 PM, Pawel Urbanski wrote:
Hi Everyone,
I'm using Contxt to typeset my book. I chose it, for hte PDF tagging support allowed me to revie things without sighted assistance.
Few months back, I noticed the following things, which can be interesting:
1. Reading PDFs in Chrome/Brave gives the best results with NVDA screen reader , which I use. 2. Adobe Reader renders some characters that are likely different types of spaching characters or glues some words without placing spaces in between. 3. I had some issues navigating by headings, namely skipping headings at osme lover levels, but maybe it is fixed now.

We had some discussions about how to proceed with tagging so here's a bit of a summary.

- basically pdf tagging is conceptually crap and not getting any better so that is something we need to live with; there is no way we can influence it, we have to live with it (as we do with fonts, unicode, whatever comes from outside the text world that doesnt' always suits wwll)

- it looks like the audience for tagging is kind of ignored; it has become the domain of bad standards (that are no standards as they get adapted; 'iso' has little meaning here), applications that have to comply (and when it's hard the standard gets adapted), a small amount of viewers (that behave differently), commerrcial and whatever interests (leverage).

- validation and government demands don't help much and the former to some extent even makes it worse because .. well .. gov demands it (of course without much detail apart from it being / becoming requirements: chickens and eggs)

From the point of view of context (!) we perfer to look at it from the users point fo view becauseit's in the end the user that matters so we have to make choices.

(1) For various reasons tagging the content stream has been part of pdf already a long time. Basically one add structure to the page content stream. Let's call this tagging level zero. We had quite some discussion (and demonstration / testing) of how currently AI tooling deals with content, including pdf and given what we observed the impression is that a clean pdf file with such structural tags in the end are most reliable. Apart from actually recognizing structure without tagging, clear directives can be enough. It might be that within a decade all this (2) and (3) mentioned next cna be ingored and ism obsolete, mayeb even confusing and pointing in the wrong direction.

(2) When one starts adding some rolemapping, say around version 1.7 something that context supported for quite a while, additional indicators to some (still undefined machinery) could be of help. That version one mapping permits a sort of okay structure -> pdf/html (or whatever one consider this mix) and math (which only serves a sub-audience) basically is an text + mathml like tree / representation.

(3) Then came pdf 2 with ua2 so we call that version 2. The role mapping actually became more limited and even keeps deteriorating as we speak. This relates to validation. When the demand is 'it has to validate wrt pdf's idea of structure' then one has to work around this and in the end gets a pretty inconsistent mapping, especially when one talks of more complex structure (beyond simple heads, a few lists and maybe tables). It looks like tagging / validation has become an aim of its own (not unsurprisingly because that's where money is made). There is also some expectations wrt reflow but when that is intended, just go html (which is why we spend time on the export and have picked up on that, more later.)

So, we have level zero structure and tagging in all cases, but in level two we crappify it by rolemapping to suite the validation. However, we can decide to have a middle way: we just map onto what pdf we thinks makes most sense. That means that future AI driven backends can behave well, and likely current less clever ones also can handle it as it is unlikely that they do much crying over things: juist read the stuff and play safe. An occasional pdf rolemap might help but as much looks ignored anyway why bother. It means that we don't validate but are therwise quite okay. Unfortunately the generic NonStruct became sort of useless and the kind of exclusive inline/block modes lead to strange mappings. One even has to rely on Artifact for non artifacts.

One reason for thinking that way is that we notice that the target audience knows very well how to deal with matters. There is no need to underestimate already available tools and usage patterns. We should concentrate on content. Also, we can stimulate rendering two versions: the intended as well as more accessible version with additional (typeset) clues, maybe reordered, maybe improved access. So, we like to know what is really needed, what would really help a visual impaired reader. We don't care about other documents (let them go rolemapping, validate and pretend being accessible) but about how to make our documents accessible and useful. Does the reader care more about (say) verapdf being happy (for a suboptimal rolemapped or crippled documents) or does she/he wants a useful document.

Currently we can validate a context document with as reliable as possible rolemapping (keep in mind: we start from rich structure, not from limited and weird rolemaps)

> I'm very eager and happy to help with real-world and pragmattic
> experience. Unfortunately, I could not make it to hte conference in
> Poland, where I'm based, due to some minor health chaos.

That is indeed a pitty but health is more important. Anyway it's users like you who should come up with the demands, not standards, committees, companies and whatever. So just push us for solutions. As always with (context) demands: it's users who drive it. And in the end it is the document and content that matter to them. We rather spend time on that than on something 'standard'.

Hans

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
       tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / 
https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki     : https://wiki.contextgarden.net
___________________________________________________________________________________

Reply via email to