Here's my script to download IBM javadocs:

(
    rm -rf wget-test
    mkdir wget-test
    cd wget-test
    
starturl="http://www-01.ibm.com/support/knowledgecenter/api/content/SSZLC2_7.0.0/com.ibm.commerce.api.doc/allclasses-noframe.html";
    wget -d -r -R robots.txt --page-requisites -nH --cut-dirs=5 --no-parent 
"$starturl" 2>&1 | tee wget.log
)

regardless of '-R' option, wget downloads robots.txt and refuses to
follow links starting with "/support/knowledgecenter/api/".

Workaround:

    touch robots.txt
    chmod 400 robots.txt

GNU Wget 1.15 built on linux-gnu


Reply via email to