The "pdftotext" command line tool from the XPDFReader team does a pretty 
credible job of extracting good, readable (and more importantly, utility 
program readable) text from IBM PDF's.  I use it very successfully to extract 
data from the z/Architecture PoOP for a side project of mine.

https://www.xpdfreader.com/download.html

Using an appropriate utility tool you should be able to extract the form number 
you need from a text extract of the PDF title page (yes, this version of 
"pdftotext" can correctly extract specific pages by absolute page number [not 
the printed page number, just the sequential page-count number]).

Example usage of "pdftotext" to extract all of Appendix B by page number from a 
specific edition of the PoOP to a UTF-8 text file with CRLF line endings:

pdftotext -f 1573 -l 1642 -eol dos -cfg sample-xpdfrc -table -nopgbrk -enc 
UTF-8 ..\zArchPDF\a22-7832-10-zArch-PoOP-2015.pdf 
a22-7832-10-zArch-PoOP-AppB.txt

The "-table" option is very good at extracting readable text data from the PoOP 
manual, I believe it will work as well on other IBM manual PDF's.

HTH

Peter

-----Original Message-----
From: IBM Mainframe Discussion List <[email protected]> On Behalf Of 
Seymour J Metz
Sent: Wednesday, October 26, 2022 10:55 AM
To: [email protected]
Subject: Re: Location of forms code in z/OS manuals

By form code I mean the xxxx-xxxx-xxxx-xx number that identifies the manual and 
used to be (without the -xx) the number used for ordering the manual. E.g., has 
the form code SA22-7801-14 and the title z/OS UNIX System Services User's Guide,

Downloading and parsing multiple index.html files should work for some but not 
all of the manuals that I've downloaded.

Thanks.
--

This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please notify us immediately by e-mail and delete the message and any 
attachments from your system.


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to