On Mon, 16 Oct 2017 20:17:01 +0000, Pew, Curtis G wrote: >On Oct 16, 2017, at 1:39 PM, John McKown wrote: >> >> The above is why I love *IX symlinks. What would be _really_ nice, IMO, >> would be if your PDF generation process placed the actual document number >> & title in the PDF information title section of the pdf file. So that I >> could get it out of the "pdfinfo" program. It would make generation of the >> symlinks simple. Something like: >> >> for i in *.pdf ; do pdfinfo "${i}" | ln -s "${i}" "$(awk '/^Title: / {print >> substr($0,17);} | sed -r 's/ *.*? *//')" ; done > >Yes, please. I tried doing something like this once, but the PDF information >sections are inconsistent or incomplete. Everything’s easier to manage when >you have consistent, reliable metadata. > Of course. In the interim, here's a script that creates the symlinks by scraping the *index.htm file:
#! /bin/sh -x # Make symbolic links for anchors in ../*index.htm # Wholeheartedly empirical. I tweaked it until it mostly worked. # Run this in an expendable subdirectory of the doc archive. ln -s ../pdf . # Make a symlink to the real PDF directory. awk ' /td.*href=.pdf.*\.pdf/ { Target = $0 sub( /.*href=./, "ln -s ", Target ) sub( /. target=.*/, " \\", Target ) print( Target ) next } /td/ && Target { L = $0 sub( /.*<td *>/, " \"", L ) sub( /<\/td>.*/, "\"", L ) gsub( /\//, "-", L ) print( L ) Target = "" next } ' ../*index.htm -- gil ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN