On 4/9/25 11:41, Matthias Weber wrote: > Hi Pablo (and Hans) > > No, Acrobat reports an empty page when reading the text, and Preview > (on the Mac) doesn’t read anything either. > I can’t find the text when looking at the pdf in an editor.
Hi Matthias, the attached PDF document should work now. Unfortunately Acrobat Reader doesn’t work for me at all (reading texts alould). But I was able to see the value of /Alt (alternative text) with the editor from https://www.ngpdf.com/. There were two issues here: First, image was originally tagged as /NonStruct instead of /Figure (which I changed with lpdf-tag-imp-testing.lmt). Second, /Alt wasn’t properly placed. veraPDF rightly warns about no /Alt or /ActualText (after having the image tagged as figure). /Alt is an entry in a /StructElem dictionary (the parent of /K, where ConTeXt places it now [https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#page=567]). After manually placing /Alt in the parent dictionary of /K and both veraPDF and https://www.ngpdf.com/ work fine now. BTW, when adding a formula (source plus mapping below), veraPDF warns about no /Alt or /ActualText for the formula (which is a requirement for PDF/UA-1, no mathml in this version [at least, no mention to mathml in the veraPDF validation rules]). Hans, I’m afraid that this haven’t been visible due to images not tagged as figures. Could it be fixed? Many thanks for your help, Pablo
b.pdf
Description: Adobe PDF document
\enabledirectives [backend.usetags=testing] \setuptagging[state=start] \setupstructure[state=start] \setupbackend [format=PDF/A-3a, intent=sRGB IEC61966-2.1, level=0] \setupbackend[format=pdf/ua-1] \setupexternalfigures[location=default] \starttext \externalfigure[cow][label={this is a cow}] \startformula a = b + c \stopformula \stoptext
if not modules then modules = { } end modules ['lpdf-tag-imp-testing'] = { version = 1.001, comment = "companion to lpdf-tag.mkiv", author = "Hans Hagen & Mikael Sundqvist", copyright = "PRAGMA ADE / ConTeXt Development Team", license = "see context related readme files" } -- You can enable this at your own risk by saying: -- -- \enabledirectives[backend.usetags=testing] -- -- which will try to satisfy the sloppy mapping in pdf. You can also use this file to -- roll out your own variant. We also use this file for testing so there coule be subtle -- changes over time. return { name = "old school pdf tagging", version = "1.00", comment = "This is our crappy level two mapping, sort of an example.", author = "Hans Hagen", copyright = "ConTeXt development team", mapping = { -- Part : no structure -- Sect : structure -- NonStruct : skip level -- Sub : line with lbl numbers document = { pua = "ua1", pdf = "Document" }, documentpart = { pua = "ua1", pdf = "NonStruct" }, division = { pua = "ua1", pdf = "Part" }, paragraph = { pua = "ua1", pdf = "P" }, subparagraph = { pua = "ua1", pdf = "P" }, p = { pua = "ua1", pdf = "P" }, highlight = { pua = "ua1", pdf = "Span" }, ornament = { pua = "ua1", pdf = "Span" }, textdisplay = { pua = "ua1", pdf = "Div" }, placeholder = { pua = "ua1", pdf = "Span" }, ["break"] = { pua = "ua1", pdf = "Div" }, construct = { pua = "ua1", pdf = "Span" }, constructleft = { pua = "ua1", pdf = "Span" }, constructright = { pua = "ua1", pdf = "Span" }, constructcontent = { pua = "ua1", pdf = "NonStruct" }, sectionblock = { pua = "ua1", pdf = "Part" }, section = { pua = "ua1", pdf = "Sect" }, -- Part sectioncaption = { pua = "ua1", pdf = "NonStruct" }, sectiontitle = { pua = "ua1", pdf = "H" }, sectionnumber = { pua = "ua1", pdf = "Lbl" }, sectioncontent = { pua = "ua1", pdf = "NonStruct" }, itemgroup = { pua = "ua1", pdf = "L" }, item = { pua = "ua1", pdf = "LI" }, itemtag = { pua = "ua1", pdf = "Lbl" }, itemcontent = { pua = "ua1", pdf = "LBody" }, itemhead = { pua = "ua1", pdf = "NonStruct" }, itembody = { pua = "ua1", pdf = "NonStruct" }, items = { pua = "ua1", pdf = "Div" }, itemsymbols = { pua = "ua1", pdf = "Div" }, itemsymbol = { pua = "ua1", pdf = "Span" }, itemtexts = { pua = "ua1", pdf = "Div" }, itemtext = { pua = "ua1", pdf = "Span" }, description = { pua = "ua1", pdf = "Sect" }, descriptiontag = { pua = "ua1", pdf = "Lbl" }, descriptioncontent = { pua = "ua1", pdf = "NonStruct" }, descriptionsymbol = { pua = "ua1", pdf = "Lbl" }, verbatimblock = { pua = "ua1", pdf = "Part" }, verbatimlines = { pua = "ua1", pdf = "Part" }, verbatimline = { pua = "ua1", pdf = "Code" }, verbatim = { pua = "ua1", pdf = "Code" }, lines = { pua = "ua1", pdf = "Part" }, line = { pua = "ua1", pdf = "Code" }, linenumber = { pua = "ua1", pdf = "Span" }, synonym = { pua = "ua1", pdf = "Span" }, sorting = { pua = "ua1", pdf = "Span" }, register = { pua = "ua1", pdf = "Part" }, registerlocation = { pua = "ua1", pdf = "Span" }, registersection = { pua = "ua1", pdf = "Part" }, registertag = { pua = "ua1", pdf = "Span" }, registerentries = { pua = "ua1", pdf = "Part" }, registerentry = { pua = "ua1", pdf = "Part" }, registercontent = { pua = "ua1", pdf = "Span" }, registersee = { pua = "ua1", pdf = "Span" }, registerpages = { pua = "ua1", pdf = "Span" }, registerpage = { pua = "ua1", pdf = "Span" }, registerseparator = { pua = "ua1", pdf = "Span" }, registerpagerange = { pua = "ua1", pdf = "Span" }, table = { pua = "ua1", pdf = "Table" }, tablerow = { pua = "ua1", pdf = "TR" }, tablecell = { pua = "ua1", pdf = "TD" }, tableheadcell = { pua = "ua1", pdf = "TH" }, tablehead = { pua = "ua1", pdf = "THEAD" }, tablebody = { pua = "ua1", pdf = "TBODY" }, tablefoot = { pua = "ua1", pdf = "TFOOT" }, tabulate = { pua = "ua1", pdf = "Table" }, tabulaterow = { pua = "ua1", pdf = "TR" }, tabulatecell = { pua = "ua1", pdf = "TD" }, tabulateheadcell = { pua = "ua1", pdf = "TH" }, tabulatehead = { pua = "ua1", pdf = "THEAD" }, tabulatebody = { pua = "ua1", pdf = "TBODY" }, tabulatefoot = { pua = "ua1", pdf = "TFOOT" }, list = { pua = "ua1", pdf = "TOC" }, listitem = { pua = "ua1", pdf = "TOCI" }, listtag = { pua = "ua1", pdf = "Lbl" }, listcontent = { pua = "ua1", pdf = "NonStruct" }, listdata = { pua = "ua1", pdf = "NonStruct" }, listpage = { pua = "ua1", pdf = "Lbl" }, listtext = { pua = "ua1", pdf = "Span" }, delimitedblock = { pua = "ua1", pdf = "BlockQuote" }, delimited = { pua = "ua1", pdf = "Quote" }, delimitedcontent = { pua = "ua1", pdf = "NonStruct" }, delimitedsymbol = { pua = "ua1", pdf = "Span" }, subsentence = { pua = "ua1", pdf = "Span" }, subsentencecontent = { pua = "ua1", pdf = "Span" }, subsentencesymbol = { pua = "ua1", pdf = "Span" }, label = { pua = "ua1", pdf = "Span" }, number = { pua = "ua1", pdf = "Span" }, float = { pua = "ua1", pdf = "Part" }, floatcaption = { pua = "ua1", pdf = "Caption" }, floatlabel = { pua = "ua1", pdf = "Span" }, floatnumber = { pua = "ua1", pdf = "Span" }, floattext = { pua = "ua1", pdf = "Span" }, floatcontent = { pua = "ua1", pdf = "NonStruct" }, image = { pua = "ua1", pdf = "Figure" }, mpgraphic = { pua = "ua1", pdf = "Figure" }, formulaset = { pua = "ua1", pdf = "Part" }, formula = { pua = "ua1", pdf = "Formula" }, formulacaption = { pua = "ua1", pdf = "Span" }, formulalabel = { pua = "ua1", pdf = "Span" }, formulanumber = { pua = "ua1", pdf = "Span" }, formulacontent = { pua = "ua1", pdf = "NonStruct" }, subformula = { pua = "ua1", pdf = "Part" }, link = { pua = "ua1", pdf = "Link" }, reference = { pua = "ua1", pdf = "NonStruct" }, navigation = { pua = "ua1", pdf = "NonStruct" }, navigationbutton = { pua = "ua1", pdf = "NonStruct" }, navigationmenu = { pua = "ua1", pdf = "NonStruct" }, navigationmenuitem = { pua = "ua1", pdf = "NonStruct" }, navigationaction = { pua = "ua1", pdf = "NonStruct" }, navigationpage = { pua = "ua1", pdf = "NonStruct" }, margintextblock = { pua = "ua1", pdf = "Aside" }, margintext = { pua = "ua1", pdf = "NonStruct" }, marginanchor = { pua = "ua1", pdf = "Span" }, linetext = { pua = "ua1", pdf = "NonStruct" }, -- no math here ignore = { pua = "ua1", pdf = "NonStruct" }, private = { pua = "ua1", pdf = "NonStruct" }, metadata = { pua = "ua1", pdf = "Part" }, metavariable = { pua = "ua1", pdf = "Span" }, mid = { pua = "ua1", pdf = "Span" }, sub = { pua = "ua1", pdf = "Span" }, sup = { pua = "ua1", pdf = "Span" }, subsup = { pua = "ua1", pdf = "Span" }, combination = { pua = "ua1", pdf = "Table" }, combinationpair = { pua = "ua1", pdf = "TR" }, combinationcontent = { pua = "ua1", pdf = "TD" }, combinationcaption = { pua = "ua1", pdf = "TD" }, publications = { pua = "ua1", pdf = "Part" }, publication = { pua = "ua1", pdf = "NonStruct" }, pubfld = { pua = "ua1", pdf = "Span" }, citation = { pua = "ua1", pdf = "Span" }, cite = { pua = "ua1", pdf = "Span" }, narrower = { pua = "ua1", pdf = "Part" }, block = { pua = "ua1", pdf = "Part" }, userdata = { pua = "ua1", pdf = "Part" }, quantity = { pua = "ua1", pdf = "Span" }, unit = { pua = "ua1", pdf = "Span" }, verse = { pua = "ua1", pdf = "Part" }, versetag = { pua = "ua1", pdf = "Lbl" }, verseseparator = { pua = "ua1", pdf = "Span" }, versecontent = { pua = "ua1", pdf = "NonStruct" }, }, -- This is a hack to get around the specifications "We can't expect an -- application to keep track of nested H's (but otherwise expect very complex -- things things to be properly dealt with)". A typical example of bugs -- becoming features, standards being not really standards as they get -- adapted, etc. That said: it is up to the user to decide what to do but -- don't blame us for the resulting less optimal structure. -- Because we don't want to spoil the otherwise rather clean structure in -- ConTeXt, this kicks in very late in the backend. We might extend this -- hackery but there are limits to what is desired. After all, in over a -- decade of pdf tagging nothing significant happened (we're speaking 2024) -- nor proper viewer support showed up and we can anyway expect LLM's to deal -- with proper tags anyway some day. overloads = { -- criterium : parent : use parent "detail" -- parents : use first in parent "parents" list -- otherwise : look at self "detail" -- We need violate the proper structure by getting a Hn on the title so we -- have to backtrack to what we're in, thereby also denying the proper -- section instance. Don't ask. The plural "parents" will make sure we -- consult the first in the chain and not the instance that is encoded in -- "detail". sectioncaption = { criterium = "parents", mapping = { part = { pua = "ua1", tag = "section_title_1", pdf = "H1" }, chapter = { pua = "ua1", tag = "section_title_2", pdf = "H2" }, title = { pua = "ua1", tag = "subject_title_2", pdf = "H2" }, section = { pua = "ua1", tag = "section_title_3", pdf = "H3" }, subject = { pua = "ua1", tag = "subject_title_3", pdf = "H3" }, subsection = { pua = "ua1", tag = "section_title_4", pdf = "H4" }, subsubject = { pua = "ua1", tag = "subject_title_4", pdf = "H4" }, subsubsection = { pua = "ua1", tag = "section_title_5", pdf = "H5" }, subsubsubject = { pua = "ua1", tag = "subject_title_5", pdf = "H5" }, subsubsubsection = { pua = "ua1", tag = "section_title_6", pdf = "H6" }, subsubsubsubject = { pua = "ua1", tag = "subject_title_6", pdf = "H6" }, subsubsubsubsection = { pua = "ua1", tag = "section_title_7", pdf = "H7" }, subsubsubsubsubject = { pua = "ua1", tag = "subject_title_7", pdf = "H7" }, subsubsubsubsubsection = { pua = "ua1", tag = "section_title_8", pdf = "H8" }, subsubsubsubsubsubject = { pua = "ua1", tag = "subject_title_8", pdf = "H8" }, subsubsubsubsubsubsection = { pua = "ua1", tag = "section_title_9", pdf = "H9" }, subsubsubsubsubsubsubject = { pua = "ua1", tag = "subject_title_9", pdf = "H9" }, subsubsubsubsubsubsubsection = { pua = "ua1", tag = "section_title_10", pdf = "H10" }, subsubsubsubsubsubsubsubject = { pua = "ua1", tag = "subject_title_10", pdf = "H10" }, }, }, }, }
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl webpage : https://www.pragma-ade.nl / https://context.aanhet.net (mirror) archive : https://github.com/contextgarden/context wiki : https://wiki.contextgarden.net ___________________________________________________________________________________