Re: Bookshelves under BookMangler

Anne & Lynn Wheeler Tue, 12 Jan 2010 16:35:32 -0800

[email protected] (Edward Jaffe) writes:
> The story, as told by John Ehrman, is that the POO got so big, it
> broke the book build software and nobody at IBM has the time,
> inclination, or knowledge to fix it. :-(


it would be fun to get a look at it to fix generation of html ...  POO
has been subset of the architecture book ... which has been twice as
large ... started out as cms script file with conditionals ... that
command line arguments to the cms script command would either format the
full document or just the POO subset.

last spring I had done a lot with the transcripts of the pecora hearings
(senate banking hearings in the wake of '29 crash ... leading up to
glass-steagall) ... with a whole lot of cross-indexing and generated
loads of hrefs.  the original scanned transcripts were six volumes with
2345 pgs total and 20 volumes with 9296 pgs total.

the original document wasn't the best ... so the scan wasn't outstanding
and several places the OCR of the scanned pages is very low quality ...
so the individual HTML'ed pages from the OCR, periodically have a lot of
garbage; as a result I put in each HTML'ed page a HREF reference back to
the corresponding page in the scan'ed document (whole thing is under two
gbytes, most of which are the original scanned files).

by comparison, Z -07 POO PDF file says 1344 pages ... for the heck of it
I just started a "save as text" ... which is going quite slow ... a lot
of the formating & figures are lost in "save as text"

-- 
40+yrs virtualization experience (since Jan68), online at home since Mar1970

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: Bookshelves under BookMangler

Reply via email to