>>> On 7/13/2009 at 10:21 AM, Scott Rohling <[email protected]> wrote: 
-snip-
> It would be
> nifty to have something that checked the web pages for newer copies of the
> pdf files and did 'wget' or something on them - and was smart enough to suck
> the title out of the web page as well and use it to name the file locally.

-snip-
> p.s.  Hmmm..  I bet I can use wget with the right incantation and get the
> whole website to my laptop along with PDFs..   but not sure it handles
> checking for changes?

If you use the -N switch for wget, it will only download newer versions, or 
versions with a different file size.  Not all web servers are helpful about 
providing the correct file size.  For a starter script, try this:
for file in *
  do base=$(echo $file | sed -e 's/\(sg......\).*$/\1/' -e 
's/\(redp....\).*$/\1/')
  echo $file $base
  wget -O - http://www.redbooks.ibm.com/abstracts/$base.html 2>&1 | grep -i 
^'<title>' | sed -e 's/<title>IBM Redbooks . //' -e 's/<\/title>//' | tr '/' 
'-' | tr -s " "
  echo --------------------
done

Run it in a directory that has a bunch of Redbooks in it., and pipe the output 
to a file.  I did find one Redbook that lots of others reference, and it still 
exists, but the abstract no longer exists for some reason: SG24-6344.


Mark Post

Reply via email to