On Mon, 16 Oct 2017 20:17:01 +0000, Pew, Curtis G wrote:
>On Oct 16, 2017, at 1:39 PM, John McKown wrote:
>>
>> The above is why I love *IX symlinks. What would be _really_ nice, IMO,
>> would be if your PDF generation process placed the actual document number
>> & title in the PDF information title section of the pdf file. So that I
>> could get it out of the "pdfinfo" program. It would make generation of the
>> symlinks simple. Something like:
>>
>> for i in *.pdf ; do pdfinfo "${i}" | ln -s "${i}" "$(awk '/^Title: / {print
>> substr($0,17);} | sed -r 's/ *.*? *//')" ; done
>
>Yes, please. I tried doing something like this once, but the PDF information
>sections are inconsistent or incomplete. Everything’s easier to manage when
>you have consistent, reliable metadata.
>
Of course. In the interim, here's a script that creates the symlinks by
scraping the *index.htm file:
#! /bin/sh -x
# Make symbolic links for anchors in ../*index.htm
# Wholeheartedly empirical. I tweaked it until it mostly worked.
# Run this in an expendable subdirectory of the doc archive.
ln -s ../pdf . # Make a symlink to the real PDF directory.
awk '
/td.*href=.pdf.*\.pdf/ {
Target = $0
sub( /.*href=./, "ln -s ", Target )
sub( /. target=.*/, " \\", Target )
print( Target )
next }
/td/ && Target {
L = $0
sub( /.*<td *>/, " \"", L )
sub( /<\/td>.*/, "\"", L )
gsub( /\//, "-", L )
print( L )
Target = ""
next }
' ../*index.htm
-- gil
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN