Hi, I am creating an scientific archive containing problem sets and want to post wget instructions for downloading the problem sets.
1. wget -r -nd -erobots=off http://www.cs.toronto.edu/maxsat-lib/maxsat-instances/master-set/unweighted -A 'zip' Works, it descends to the subdirectories under unweighted, and retrieves the zip files in contained in each subdirectory. 2. wget -r -nd -erobots=off http://www.cs.toronto.edu/maxsat-lib/maxsat-instances/master-set/ -A 'zip' Does not work it stops after rejecting the index.html file in master-set. 3. wget -r -nd -erobots=off http://www.cs.toronto.edu/maxsat-lib/maxsat-instances/master-set/ Kind of works, it gets all of the files, but does not restrict itself to the zip files. Maybe I don't understand the options? But it looks like a bug in the interaction of the -A flag and descending into subdirectories? thanks Fahiem Bacchus Here is the site http://www.cs.toronto.edu/maxsat-lib/ With directory structure: master-instances master-set unweighted CircuitDebuggingProblems CircuitDebuggingProblems.zip .... many other subdirs each containing a zip weighted many subdirs each containing a zip ms-evals original I also tried a -l 10 flag...did not help. Version info: ============ GNU Wget 1.20.3 built on darwin18.6.0. -cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie -psl +ssl/openssl Wgetrc: /usr/local/etc/wgetrc (system) Locale: /usr/local/Cellar/wget/1.20.3_1/share/locale Compile: clang -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/usr/local/etc/wgetrc" -DLOCALEDIR="/usr/local/Cellar/wget/1.20.3_1/share/locale" -I. -I../lib -I../lib -I/usr/local/opt/[email protected]/include -DNDEBUG -g -O2 Link: clang -DNDEBUG -g -O2 -lidn2 -L/usr/local/opt/[email protected]/lib -lssl -lcrypto -ldl -lz ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a -liconv -lintl -Wl,-framework -Wl,CoreFoundation -lunistring Copyright (C) 2015 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://www.gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Originally written by Hrvoje Niksic <[email protected]>. Please send bug reports and questions to <[email protected]>. ========= /usr/local/etc/wgetrc -------------------------- ### ### Sample Wget initialization file .wgetrc ### ## You can use this file to change the default behaviour of wget or to ## avoid having to type many many command-line options. This file does ## not contain a comprehensive list of commands -- look at the manual ## to find out what you can put into this file. You can find this here: ## $ info wget.info 'Startup File' ## Or online here: ## https://www.gnu.org/software/wget/manual/wget.html#Startup-File ## ## Wget initialization file can reside in /usr/local/etc/wgetrc ## (global, for all users) or $HOME/.wgetrc (for a single user). ## ## To use the settings in this file, you will have to uncomment them, ## as well as change them, in most cases, as the values on the ## commented-out lines are the default values (e.g. "off"). ## ## Command are case-, underscore- and minus-insensitive. ## For example ftp_proxy, ftp-proxy and ftpproxy are the same. ## ## Global settings (useful for setting up in /usr/local/etc/wgetrc). ## Think well before you change them, since they may reduce wget's ## functionality, and make it behave contrary to the documentation: ## # You can set retrieve quota for beginners by specifying a value # optionally followed by 'K' (kilobytes) or 'M' (megabytes). The # default quota is unlimited. #quota = inf # You can lower (or raise) the default number of retries when # downloading a file (default is 20). #tries = 20 # Lowering the maximum depth of the recursive retrieval is handy to # prevent newbies from going too "deep" when they unwittingly start # the recursive retrieval. The default is 5. #reclevel = 5 # By default Wget uses "passive FTP" transfer where the client # initiates the data connection to the server rather than the other # way around. That is required on systems behind NAT where the client # computer cannot be easily reached from the Internet. However, some # firewalls software explicitly supports active FTP and in fact has # problems supporting passive transfer. If you are in such # environment, use "passive_ftp = off" to revert to active FTP. #passive_ftp = off # The "wait" command below makes Wget wait between every connection. # If, instead, you want Wget to wait only between retries of failed # downloads, set waitretry to maximum number of seconds to wait (Wget # will use "linear backoff", waiting 1 second after the first failure # on a file, 2 seconds after the second failure, etc. up to this max). #waitretry = 10 ## ## Local settings (for a user to set in his $HOME/.wgetrc). It is ## *highly* undesirable to put these settings in the global file, since ## they are potentially dangerous to "normal" users. ## ## Even when setting up your own ~/.wgetrc, you should know what you ## are doing before doing so. ## # Set this to on to use timestamping by default: #timestamping = off # It is a good idea to make Wget send your email address in a `From:' # header with your request (so that server administrators can contact # you in case of errors). Wget does *not* send `From:' by default. #header = From: Your Name <[email protected]> # You can set up other headers, like Accept-Language. Accept-Language # is *not* sent by default. #header = Accept-Language: en # You can set the default proxies for Wget to use for http, https, and ftp. # They will override the value in the environment. #https_proxy = http://proxy.yoyodyne.com:18023/ #http_proxy = http://proxy.yoyodyne.com:18023/ #ftp_proxy = http://proxy.yoyodyne.com:18023/ # If you do not want to use proxy at all, set this to off. #use_proxy = on # You can customize the retrieval outlook. Valid options are default, # binary, mega and micro. #dot_style = default # Setting this to off makes Wget not download /robots.txt. Be sure to # know *exactly* what /robots.txt is and how it is used before changing # the default! #robots = on # It can be useful to make Wget wait between connections. Set this to # the number of seconds you want Wget to wait. #wait = 0 # You can force creating directory structure, even if a single is being # retrieved, by setting this to on. #dirstruct = off # You can turn on recursive retrieving by default (don't do this if # you are not sure you know what it means) by setting this to on. #recursive = off # To always back up file X as X.orig before converting its links (due # to -k / --convert-links / convert_links = on having been specified), # set this variable to on: #backup_converted = off # To have Wget follow FTP links from HTML files by default, set this # to on: #follow_ftp = off # To try ipv6 addresses first: #prefer-family = IPv6 # Set default IRI support state #iri = off # Force the default system encoding #localencoding = UTF-8 # Force the default remote server encoding #remoteencoding = UTF-8 # Turn on to prevent following non-HTTPS links when in recursive mode #httpsonly = off # Tune HTTPS security (auto, SSLv2, SSLv3, TLSv1, PFS) #secureprotocol = auto -- Fahiem Bacchus Professor of Computer Science University of Toronto
