The "pdftotext" command line tool from the XPDFReader team does a pretty credible job of extracting good, readable (and more importantly, utility program readable) text from IBM PDF's. I use it very successfully to extract data from the z/Architecture PoOP for a side project of mine.
https://www.xpdfreader.com/download.html Using an appropriate utility tool you should be able to extract the form number you need from a text extract of the PDF title page (yes, this version of "pdftotext" can correctly extract specific pages by absolute page number [not the printed page number, just the sequential page-count number]). Example usage of "pdftotext" to extract all of Appendix B by page number from a specific edition of the PoOP to a UTF-8 text file with CRLF line endings: pdftotext -f 1573 -l 1642 -eol dos -cfg sample-xpdfrc -table -nopgbrk -enc UTF-8 ..\zArchPDF\a22-7832-10-zArch-PoOP-2015.pdf a22-7832-10-zArch-PoOP-AppB.txt The "-table" option is very good at extracting readable text data from the PoOP manual, I believe it will work as well on other IBM manual PDF's. HTH Peter -----Original Message----- From: IBM Mainframe Discussion List <[email protected]> On Behalf Of Seymour J Metz Sent: Wednesday, October 26, 2022 10:55 AM To: [email protected] Subject: Re: Location of forms code in z/OS manuals By form code I mean the xxxx-xxxx-xxxx-xx number that identifies the manual and used to be (without the -xx) the number used for ordering the manual. E.g., has the form code SA22-7801-14 and the title z/OS UNIX System Services User's Guide, Downloading and parsing multiple index.html files should work for some but not all of the manuals that I've downloaded. Thanks. -- This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
