I would like to submit three recent scripts, which I wrote to deal with
ebook difficulties on my Victor Reader ( a "talking book"  reader with
text-to-voice capability).

heading-numbers.awk makes a HTML page more readable in my Victor Reader.  A
HTML page works as well as an ePub or DAISY book, but often does not have
heading numbers. The absence of heading numbers makes it difficult to
navigate a very long page via hearing, because you don't know where you are
in the file.  This awk program puts multilevel heading numbers at the start
of every heading.  The victor Reader can be set to jump from one heading to
another at any level from 1 to 6, allowing me to quickly zero in on the
part of the page I want.

I wrote the other two scripts to troubleshoot a problem where my Victor
Reader would not jump to some of the headings in the ePub,

epubcheck.sh runs the W3C ePub Validator.  It is easy to run the validator
correctly but the script allows me to park the package in a subdirectory of
the bin directory and run the validator with a simple command that is on my
PATH.  The validator showed that my problem ebook was a valid ePub 2.0.1
file.

html-headings.awk
<https://mail.google.com/mail/u/0?ui=2&ik=e676b26630&attid=0.2&permmsgid=msg-a:r2437192347788727639&view=att&disp=safe&realattid=f_lpu7k5tg1>
extracts
headings from an HTML or ePub file.  I unzipped the ePub file to and
empty directory and rand the command

html-headings.awk
<https://mail.google.com/mail/u/0?ui=2&ik=e676b26630&attid=0.2&permmsgid=msg-a:r2437192347788727639&view=att&disp=safe&realattid=f_lpu7k5tg1>
*.html > all-headings.txt
The command produced a list of 272 headings, one to a line.

I looked at the headings and CSS file, could not see a reason for the
behaviour and submitted a report to the manufacturer.

Both AWK programs use a feature of AWK to make the job easy: the record
separators (RS and ORS, normally both are line-ends) are arbitrary,
Changing the record separators to '<' places every tag at the beginning of
a line, making it easy to identify the tags you want to work with.




On Tue, Dec 5, 2023 at 11:45 PM tug <tug.willi...@gmail.com> wrote:

> Meeting Announcement
>
> Linux-Ottawa December 2023 Meeting.
> Date/Time:
>
> Thursday December 7th at 7pm
> Format:
>
> 1. Online over Jitsi https://meet.jit.si/oclug_2023-12-07
>
> 2. Check mailing list for in-person clusters being hosted by other
> members.
> Program:
>
> *Topics*
>
> 1. Script Night: If you have a useful script you want to share, or a
> problem script you want to rescue, bring it along.
>
> Current offerings include
>
> John Nash - scripts to move data to and from cloud storage.
>
> Tug Williams - using bash to download and compress feeds for use over low
> bandwidth connections.
>
> Anyone else! - anything anyone wants to bring up.
>
>

-- 
______________________________________________________________________________
Ian Earl Gorman | //www.gorman.ca/ | //web.ncf.ca/iegorman/
//github.com/iegorman/ | //www.linkedin.com/in/iegorman/

Attachment: html-heading-numbers.awk
Description: Binary data

Attachment: html-headings.awk
Description: Binary data

#!/usr/bin/bash
# Run the DAISY Consortium (W3C) ePub validator
# Check that input file is compliant with either ePUB2 or ePUB3
# needs a Java runtime (1.7 or above)

# DAISY Consortium ePub Validator was downloaded from
#       https://github.com/w3c/epubcheck/releases/tag/v5.1.0

set     -e      # exit host (#!) shell after any script error
set     -u      # reference to unset parameter is an error

SCRIPT=${0##*/}                         # script filename
SCRIPTDIR="${0%/*}"                     # script location
EPUBCHECKDIR="${SCRIPTDIR}/epub/epubcheck-5.1.0"    # software location
EPUBCHECKJAR="epubcheck.jar"           # main Java archive

if [[ "$#" -lt 1 ]]
then
    echo "Usage: ${SCRIPT} filename.epub"
    echo "       Check that file is in a valid EPUB2 or EPUB3 format"
    echo '    Extension "epub" can be any case: "EPUB", "ePub", ...'
    echo ""
    echo "Help:  ${SCRIPT} --help"
    exit 1
fi

/usr/bin/java -jar "${EPUBCHECKDIR}/${EPUBCHECKJAR}" "$@"

Reply via email to