Wow, thanks a million, it now works like a charm! I tried something similar from a random googling before, but it didn't work; maybe I used the wrong "spoofing" agent? Alas, I have no idea what is going on, if you by any chance have any useful link to an explanation of how it works, I'd be glad to learn. Anyhow, this saved me lots of effort, time to google again for some basic info about spoofing!
Thanks again, Johnny On Thu, 2010-08-05 at 13:51 +0200, Giuseppe Scrivano wrote: > Johnny <[email protected]> writes: > > > I am trying to fetch a complete set of pdf docs, whereof some are > > "hidden" in a collapsible list; if you visit the site you must expand > > the list to get the docs. Usind wget, I cannot get all the files (the > > top level files downloads, but not the rest). > > > > This is what I tried: > > wget -r -H -l 3 -A pdf > > https://www.ukap1000application.com/doc_pdf_library.aspx > > I get a different page if I spoof the user-agent. > > $ wget -O- -q https://www.ukap1000application.com/doc_pdf_library.aspx \ > | wc -c > 36152 > > $ wget -q -O- \ > --user-agent "Mozilla/5.0 (rv:1.9.2.8) Gecko/20100803 Foo/3.6.8" \ > https://www.ukap1000application.com/doc_pdf_library.aspx | wc -c > 174706 > > Try to do the same with your command. > > Cheers, > Giuseppe
