[NTG-context] index page ranges where start page == end page

Sanjoy Mahajan Tue, 09 Jul 2019 22:19:56 -0700

The following probably well known minimal example

  \starttext
  \startregister[index][key1]{an entry}\input knuth\stopregister[index][key1]


  \placeindex

  \stoptext

produces the index entry "an entry 1--1", where the page range should be
just a single page.

A question (and then two workarounds): Could ConTeXt automatically
change such page ranges to the single page ("an entry 1")?

Meanwhile, a couple of workarounds:

In the MkII (pre-lua) days, I wrote a shell/sed/awk/perl script that
converted the .pdf file to text, grepped for (strings that are likely to
be) page ranges, and spat out lines where the page range should be a
page.  (Those days were so long ago that I've forgotten the script's
language.)  Then I would manually find and change the corresponding
\startregister[index] in the source to a plain \index (and delete the
corresponding \stopregister).

In these MkIV days, with the index data in a lua table, I've been
playing with the following python2 script that parse the .tuc file to
spit out the same information slightly more reliably.  In the next
step, coming soon, the script will check that each \seeindex entry
points to an actual entry and also that each \seeindex{also ...} entry
originates from an actual entry (otherwise it shouldn't be "also").

The script requires the slpp package;

  pip install git+https://github.com/SirAnthony/slpp

The parser in the script has a bug in that it doesn't handle minus signs
in the lua table, maybe everywhere or maybe only some of them, so the
script replaces them with "_" before sending the data to the parser (a
terrible hack).

To run the script:

  python check-index.py < file.tuc

Here is check-index.py:

# parse lua-format index-entry table extracted from book.tuc to check
# that each xref points to an actual entry, and that, further, "see
# also" src entries have at least one page ref (perhaps in subtree)

# due to bug in slpp parser: replace "-" (minus signs) in data with "_"

from slpp import slpp as lua
from sys import stdin,stderr
import re

def tuclist2entry(l):
    return '+'.join(l)

data = []
intro_re = r'utilitydata.structures.registers.collected\s*='
in_register_data = False
for l in stdin:
    if re.match(intro_re, l):
        in_register_data = True
        data.append(re.sub(intro_re,'',l))
    elif in_register_data:
        data.append(l)
        if re.match(r'}$', l):
            in_register_data = False
data = re.sub("-", "_", ''.join(data))
dict = lua.decode(data)
index = dict['index']

entries = []
xrefs   = []
for entry in index['entries']:
    src = [x[0] for x in entry['list']]
    refs = entry['references']
    if 'seeword' in entry:
        dest = entry['seeword']['text']
        xrefs.append((src,dest))
    else:
        page = refs['realpage']
        if refs.get('lastrealpage',None) == page:
            print "silly range:", tuclist2entry(src)
        entries.append(src)

for xref in  xrefs:
    print tuclist2entry(xref[0]),'->', xref[1]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

[NTG-context] index page ranges where start page == end page

Reply via email to