Hello, Cherise Haywood <[email protected]> writes:
> I am trying to download specific .zip files from this website: > https://www2.census.gov/geo/tiger/TIGER2012/ROADS/ > > I have used several iterations of wget to yield only the folders ( > directories) being formed, but absolutely no data being downloaded. > > Here are copies of the code I have used: > > OPTION 1: wget --no-parent --relative --recursive --level=2 > --accept=zip --mirror -A .zip > https://www2.census.gov/geo/tiger/TIGER2012/ROADS/ > > Can you assist? It seems that wget has problems with parsing the /robots.txt correctly: the empty record for “User-Agent: *” appears to cause it to consider all paths disallowed. To work around the issue you may disable honouring the /robots.txt by adding “--execute robots=off” to your command-line. > OPTION 2: wget --no-parent --relative --recursive --level=2 > --accept=zip --mirror -A *_72*.zip --time-stamps > https://www2.census.gov/geo/tiger/TIGER2012/ROADS/ --time-stamps should probably have been --timestamping. --mirror sets an infinite recursion depth (--level=inf). You may limit the depth when using --mirror by specifying --level after --mirror (I believe). > OPTION 3: wget --no-parent --relative --recursive --level=2 > --accept=zip --mirror -A _72 > https://www2.census.gov/geo/tiger/TIGER2012/ROADS Having multiple patterns specified with -A, --accept either using separate arguments or comma separated patterns will accept a file if *any one* of the patterns matches. > I only want the files with *_72*.zip to be downloaded to a copy of the > directories it comes from on my system. This is the invocation I have come up with (backslash used as line continuation marker): wget --execute robots=off --timestamping \ --no-parent --recursive --level=1 \ --accept '*_72*.zip' \ 'https://www2.census.gov/geo/tiger/TIGER2012/ROADS/' Make sure to quote strings containing characters with special meaning to your shell (like the ‘*’ often used for globing). --level=1 seems to be enough to get the .zip files: they are all in the directory your URL points to – but you should check that. > I have attached error imgs, I captured! It would have been better, had you provided a log in text form. Wget can be instructed to output to a log file using --output-file or --append-output; if you still want to see the progress bar also add --show-progress. You may also use the Windows’ command-prompt redirection operator “> /path/to/file” to write wget’s output to a file. Happy data analysing, I presume. -- Felix Dietrich
