>>> On 7/13/2009 at 10:21 AM, Scott Rohling <[email protected]> wrote: -snip- > It would be > nifty to have something that checked the web pages for newer copies of the > pdf files and did 'wget' or something on them - and was smart enough to suck > the title out of the web page as well and use it to name the file locally.
-snip- > p.s. Hmmm.. I bet I can use wget with the right incantation and get the > whole website to my laptop along with PDFs.. but not sure it handles > checking for changes? If you use the -N switch for wget, it will only download newer versions, or versions with a different file size. Not all web servers are helpful about providing the correct file size. For a starter script, try this: for file in * do base=$(echo $file | sed -e 's/\(sg......\).*$/\1/' -e 's/\(redp....\).*$/\1/') echo $file $base wget -O - http://www.redbooks.ibm.com/abstracts/$base.html 2>&1 | grep -i ^'<title>' | sed -e 's/<title>IBM Redbooks . //' -e 's/<\/title>//' | tr '/' '-' | tr -s " " echo -------------------- done Run it in a directory that has a bunch of Redbooks in it., and pipe the output to a file. I did find one Redbook that lots of others reference, and it still exists, but the abstract no longer exists for some reason: SG24-6344. Mark Post
