Re: How to change the name of the output file
On Wed, 05 Dec 2007 21:41:14 -0800 Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mauro Tortonesi wrote: On Tuesday 20 November 2007 20:38:13 Micah Cowan wrote: Be advised, though, that -O doesn't simply mean make the name of the downloaded result `filename'; it means act as if you're redirecting output to a file named `filename'. In particular, this means that such things as timestamping, and multiple URLs, may not work as you expect. micah, i believe this text is a good candidate for inclusion in the man page. Heh, I turned your suggestion into a bug... and then, now that I've had a chance to take a closer look, I've discovered I already added some clarifying text to the -O option's description. How's this look? http://hg.addictivecode.org/wget/1.11/rev/5e5eae3f8d9f Use of `-O' is _not_ intended to mean simply use the name FILE instead of the one in the URL; rather, it is analogous to shell redirection: `wget -O file http://foo' is intended to work like`wget -O - http://foo file'; `file' will be truncated immediately, and _all_ downloaded content will be written there. Note that a combination with `-k' is only permitted when downloading a single document, and combination with any of `-r',`-p', or `-N' is not allowed. looks perfect to me. hopefully, now we'll have less complaints from users who try -O for multiple downloads ;-) -- Mauro Tortonesi [EMAIL PROTECTED]
Re: wget2
On Friday 30 November 2007 14:48:07 David Ginger wrote: what do you think? Python. i was asking what you guys think of my write a prototype using a dynamic language then incrementally rewrite everything in C proposal, and not trying to start yet another programming language flame war ;-) i believe that for sheer application prototyping purposes, ruby and python are equally good. in addition, i know and like both of them. so, in case micah is actually evaluating ruby and python, i don't really care which one of them he will finally choose to adopt. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng. http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linux http://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget2
On Friday 30 November 2007 11:59:45 Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: I vote we stick with C. Java is slower and more prone to environmental problems. not really. because of its JIT compiler, Java is often as fast as C/C++, and sometimes even significantly faster. Not if you count startup time, which is crucial for a program like Wget. Memory use is also incomparable. right. i was not suggesting to implement wget2 in Java, anyway ;-) but we could definitely make good use of dynamic languages such as Ruby (my personal favorite) or Python, at least for rapid prototyping purposes. both Ruby and Python support event-driven I/O (http://rubyeventmachine.com for Ruby, and http://code.google.com/p/pyevent/ for Python) and asynch DNS (http://cares.rubyforge.org/ for Ruby and http://code.google.com/p/adns-python/ for Python) and both are relatively easy to interface with C code. writing a small prototype for wget2 in Ruby or Python at first, and then incrementally rewrite it in C would save us a lot of development time, IMVHO. what do you think? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng. http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linux http://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget2
On Friday 30 November 2007 03:29:05 Josh Williams wrote: On 11/29/07, Alan Thomas [EMAIL PROTECTED] wrote: Sorry for the misunderstanding. Honestly, Java would be a great language for what wget does. Lots of built-in support for web stuff. However, I was kidding about that. wget has a ton of great functionality, and I am a reformed C/C++ programmer (or a recent Java convert). But I love using wget! I vote we stick with C. Java is slower and more prone to environmental problems. not really. because of its JIT compiler, Java is often as fast as C/C++, and sometimes even significantly faster. Wget needs to be as independent as we can possibly make it. A lot of the systems that wget is used on (including mine) do not even have Java installed. That would be a HUGE requirement for many people. i totally agree. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng. http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linux http://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: How to change the name of the output file
On Tuesday 20 November 2007 20:38:13 Micah Cowan wrote: Be advised, though, that -O doesn't simply mean make the name of the downloaded result `filename'; it means act as if you're redirecting output to a file named `filename'. In particular, this means that such things as timestamping, and multiple URLs, may not work as you expect. micah, i believe this text is a good candidate for inclusion in the man page. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng. http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linux http://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: .1, .2 before suffix rather than after
On Sunday 04 November 2007 22:54:24 Hrvoje Niksic wrote: Micah Cowan [EMAIL PROTECTED] writes: Christian Roche has submitted a revised version of a patch to modify the unique-name-finding algorithm to generate names in the pattern foo-n.html rather than foo.html.n. The patch looks good, and will likely go in very soon. foo.html.n has the advantage of simplicity: you can tell at a glance that foo.n is a duplicate of foo. Also, it is trivial to remove the unwanted files by removing foo.*. Why change what worked so well in the past? i totally agree with hrvoje here. also note that changing wget unique-name-finding algorithm can potentially break lots of wget-based scripts out there. i think we should leave these kind of changes for wget2 - or wget-on-steroids or however you want to call it ;-) -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng. http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linux http://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
HEAD request logic summary
hi to everybody, here is a table resuming the behaviour of current wget version (soon to be 1.11) and wget 1.10.2 regarding HTTP HEAD requests. i hope the table will be useful to determine whether the currently implemented logic is correct. please notice that recently micah changed the behaviour of --no-content-disposition option, turning it on by default. that is, by default wget will not consider Content-Disposition header in HTTP resource retrieval. -N | --no-content-disposition | HTTP Content-Disposition | Preliminary HEAD | Preliminary HEAD | Test name | | header presence | request in 1.11 | request in 1.10.2 | --- no | no | no | yes | no | Test-noop no | no | yes | yes | no | Test-HTTP-Content-Disposition no | yes | no | no | N/A | Test--no-content-disposition-trivial no | yes | yes | no | N/A | Test--no-content-disposition yes | no | no | yes | no | Test-N yes | no | yes | yes | no | Test-N-HTTP-Content-Disposition yes | yes | no | no | N/A | Test-N--no-content-disposition-trivial yes | yes | yes | no | N/A | Test-N--no-content-disposition -O | --no-content-disposition | HTTP Content-Disposition | Preliminary HEAD | Preliminary HEAD | Test name | | header presence | request in 1.11 | request in 1.10.2 | --- no | no | no | yes | no | Test-noop no | no | yes | yes | no | Test-HTTP-Content-Disposition no | yes | no | no | N/A | Test--no-content-disposition-trivial no | yes | yes | no | N/A | Test--no-content-disposition yes | no | no | yes | no | Test-O yes | no | yes | yes | no | Test-O-HTTP-Content-Disposition yes | yes | no | no | N/A | Test-O--no-content-disposition-trivial yes | yes | yes | no | N/A | Test-O--no-content-disposition --spider | -r | --no-content-disposition | HTTP Content-Disposition | Preliminary HEAD | Preliminary HEAD | Test name | | | header presence | request in 1.11 | request in 1.10.2 | --|--- yes | no | no | no | yes | yes | Test--spider yes | no | no | yes | yes | yes | Test--spider-HTTP-Content-Disposition yes | no | yes | no | yes | N/A | Test--spider--no-content-disposition-trivial yes | no | yes | yes | yes | N/A | Test--spider--no-content-disposition yes | yes | no | no | yes | N/A* | Test--spider-r yes | yes | no | yes | yes | N/A* | Test--spider-r-HTTP-Content-Disposition yes | yes | yes | no | yes | N/A* | Test--spider-r--no-content-disposition-trivial yes | yes | yes | yes | yes | N/A* | Test--spider-r--no-content-disposition *) recursive spider mode is broken in 1.10.2 -- Mauro Tortonesi [EMAIL PROTECTED]
Re: wget bug?
On Mon, 9 Jul 2007 15:06:52 +1200 [EMAIL PROTECTED] wrote: wget under win2000/win XP I get No such file or directory error messages when using the follwing command line. wget -s --save-headers http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1class=Arc; %1 = 212BI Any ideas? hi nikolaus, in windows, you're supposed to use %VARIABLE_NAME% for variable substitution. try using %1% instead of %1. -- Mauro Tortonesi [EMAIL PROTECTED]
Re: New wget maintainer
On Tue, 26 Jun 2007 13:33:35 -0700 Micah Cowan [EMAIL PROTECTED] wrote: hi micah, The GNU Project has appointed me as the new maintainer for wget, to fill the shoes that Mauro Tortonesi is leaving. I am very excited to be able to take part in the development of such a terrific and useful tool. I've certainly found it very helpful on many occasions. congratulations on your appointment as the new wget maintainer. i hope you'll have more time to dedicate to wget than i did so far, and i am sure you'll bring a lot of enthusiasm and new energies in the wget community. I have had the opportunity to go over most of the wget source code, and the last couple of years' worth of mailing list archives. This has given me a fairly good sense of where the project is, and where it could be going. I already have some ideas of some of the things I would like to see happen; many of them are already in the current TODO file. I've also assigned rough priorities (my own) to things I've seen in the TODO file, or bugs that have been reported on-list. Ideally, I'd like to start using a bug tracker to handle these; reading from the list, I know that this was Mauro's desire as well. Has consideration been given to using Savannah for this purpose? yes, we definitely need a bug tracker. Being that we seem to be very close to a release, I do not want to make a bunch of sudden changes, either to current processes or to the current plans for the imminent release. However, there are a couple of small items that I feel should absolutely be resolved before 1.11 is released officially: - Wget should not be attempting basic authentication before it receives a challenge (which could be digest or what have you). This is a security issue. i am not so sure this is a critical point. as hrvoje pointed out, basic authentication is definitely the most used authentication mechanism on the web, so changing the current policy to perform digest authentication first and use basic authentication as a failover might result in a perfomance penalty. in addition, both basic and digest authentication are meant to be used in https only. in fact, while digest authentication does not send the password in clear text over the wire, it certainly does not protect from MiM attacks. wrt digest authentication, it would be nice to have it work for proxy connections as well. so far, wget supports only basic authentication for HTTP proxies (no NTLM authentication either). - There was a report to the mailing list that user:pass information was being sent in the Referer header. I didn't see any further activity on that thread, and haven't yet had the opportunity to confirm this; it may be an old, fixed issue. However, if it's true, I would consider this to be a show-stopper. yes, we need to check that. I expect that both of these issues would require very small effort to resolve. don't be so sure about it ;-) Also, GNU maintainers have been asked to move all packages to version 3 of the GPL, which will be released on Friday the 29th. Ideally, maintainers have been asked to coincide releases with the license updates with the release of GPLv3; I don't think this is feasible in our case. Barring that, we have been asked to get such a release out by end-of-July. I'm not certain whether 1.11 will be ready in time; in that case, we could probably issue a 1.10.3 with only the licensing change. IMVHO, the code in the trunk is ready to be released. -- Mauro Tortonesi [EMAIL PROTECTED]
Re: Automate ul/dl using wget
On Mon, 18 Jun 2007 10:13:48 -0700 (PDT) Joe Kopra [EMAIL PROTECTED] wrote: Please forgive my ignorance if this question is misdirected, if you know a better tool to do what I am attempting, please tell me. I am trying to upload a file from a unix script to a website (that is interactive) and get the resulting .html back to the unix box. I am including a sample .mup file for use with the website and of course the site itself. I believe IBM has found a vulnerability using their programmatic utility and has such shut it down, so I am trying this as a workaround, please see below: http://www14.software.ibm.com/webapp/set2/mds/fetch?page=mds.html see Upload a data file and About programmatic upload of survey files. .mup file included for testing purposes. If there is no solution with wget, please recommend anything you think might help. hi joe, wget does not natively support multipart uploads at the moment. but you might be able to do what you need using this shell script: #!/bin/bash BOUNDARY=AaB03x echo -n --$BOUNDARY\r\nContent-disposition: form-data; name=\mdsData\\r\nContent-Type: text/plain\r\n\r\n tmpfile cat $1 tmpfile echo -n --$BOUNDARY\r\n tmpfile wget --header=Content-type: multipart/form-data, boundary=$BOUNDARY \ --post-file=tmpfile \ http://www14.software.ibm.com/webapp/set2/mds/mds rm -f tmpfile # end of script the usage, of course, is: sh scriptname filetoupload let me know if this solved your problem. but, please, let's continue this conversation on the wget ml. -- Mauro Tortonesi [EMAIL PROTECTED]
website updated
hi to everybody, i have just updated wget's website on sunsite.dk: http://wget.sunsite.dk/ please take a look at it and tell me what you think about it. i suck at css, so the site graphics is very lean-and-mean (i would say actually inexistent). if any of you guys wants to work on a more attractive layout for the website, you're more than welcome to do it. i am planning to rewrite the development page. the current one is just a placeholder. in particular, i have some ideas about a new feature wishlist section which i think could be very interesting for our users. i am also open to any kind of suggestions about changing current web pages or adding some new content to the website. from now on, i'll do my best to keep the new website up-to-date. in addition, i have recently installed a new bugzilla bug tracker for wget: https://ds.ing.unife.it/bugzilla/ because of several technical problems, the previous bug tracker installation was never actually used. however, i expect the new bug tracker to be very useful in the bug fixing process. please, notice that the adoption of the new bug tracker will not change the current bug reporting procedure from the users' perspective. wget users will keep sending bug reports via email at the address bug-wget_AT_gnu.org. the bug tracker will only be used by wget developers to keep track of reported bugs and of their status. since the information in the bug tracker will be accessible by everyone, our users will be able to better monitor the status of their bug reports as well. as a bottom line, i know i have not been very active recently. i have had my share of problems with my job and, on top of that, i just moved to a new house. i expect to have more time to work on wget from now on. however, i realize i'll never have enough time to dedicate to wget in order to do a good job as a maintainer. for this reason, i intend to step down from my wget maintainer position. don't worry, i'll keep working on wget as a normal developer. so, this won't hurt the development of wget at all. on the contrary, i expect that my decision will significantly help in making the development wget more agile. i've just informed the FSF about this. i am sure they will be able to find a skilled developer who has more time than i have to work on wget and is eager to face the challenge of being the new maintainer. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: problem with no-parent option
Piotr Stankiewicz wrote: Hello! I'm using wget for windows version 1.10.2. I'm trying to download the contents of my photography site. For doing that I created the following command: wget --wait 2 --random-wait -r -l7 -H -p --convert-links --html-extension -Dpbase.com -- exclude-domains forum.pbase.com,search.pbase.com --no-parent -e robots=off http://www.pbase.com/piotrstankiewicz (I had to use -H option as the photos are placed at other servers that www.pbase.com) Unfortunately wget seems to ignore --no-parent option as it starts to download also www.pbase.com/index.html www.pbase.com/help.hmtl documents and others placed in the main directory. I have impression it's some kind of bug, although I'm not definitely wget expert. Could you try to verify it please? hi piotr, both the url you specified: http://www.pbase.com/piotrstankiewicz and the urls you don't want to retrieve: http://www.pbase.com/help.html http://www.pbase.com/index.html reside in the same directory, so the --no-parent option can't help you. you should probably try to append '/' to the first url: wget --wait 2 --random-wait -r -l7 -H -p --convert-links --html-extension -Dpbase.com --exclude-domains forum.pbase.com,search.pbase.com --no-parent -e robots=off http://www.pbase.com/piotrstankiewicz/ this command should work. Additionnaly I tried to use the option -R to exclude those files. In such a case wget downloads those files and deletes it after but it follows the links from those files (which is unwated by me). I found the information that it's by design. correct. in recursive mode wget retrieves undesired html files to parse them for other urls to download, and deletes them after parsing. But what about introducing any other option precising if the links from the unwated documents (specified with -R) should be followed or no (in some cases it's not welcome). i agree. users should be able to tell wget not to retrieve undesired html files at all. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Stripping unnecessary ../../ in relative links
Sylvain wrote: I forgot to add I was using # wget --version GNU Wget 1.10.2 and # uname -sr Linux 2.6.19.1 Wget has been compiled with ssl and nls support, that's all. hi sylvain, could you please try the current version of wget from our subversion repository: http://www.gnu.org/software/wget/wgetdev.html#development ? this bug should be fixed in the new code. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget problem
[EMAIL PROTECTED] wrote: Dear Sir, I have installed wget 1.10.2 into HP-UX 11.23 from http://hpux.cs.utah.edu/hppd/hpux/Gnu/wget-1.10.2/. Also I have installed the runtime dependency packages like libgcc, gettext, libiconv and openssl. However when I run this and get some testing web content. The following errors is prompted. # wget http://10.1.1.15 --12:46:00-- http://10.1.1.15/ = `index.html' Connecting to 10.1.1.15:80... connected. HTTP request sent, awaiting response... 200 OK /usr/lib/hpux32/dld.so: Unsatisfied code symbol '__umodsi3' in load module '/usr/local/bin/wget'. Killed Could you please help to tell me what's wrong on the issue? Thanks. hi cheng, i am not an expert of HP UX, but it seems you have a broken installation. are you sure you correctly installed all the required dependencies: libgcc gettext libiconv openssl (in particular libgcc)? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Wget - no files retained when 401/403 is received
Chris Dunkle wrote: Wget developers, This may not be considered a bug, but it is unexpected behavior for me, and thus I'm reporting it here. I'm using GNU Wget 1.10.2 with the following options: wget -r -x -p --save-headers 192.168.0.1 The web server requires a username and password for the default page, and thus I receive a 401 Unauthorized response. I would expect that the HTML data that was returned, including the HTTP headers, would be saved in the file 192.168.0.1/index.html. But instead, there are no files written, and the directory isn't even created. In most cases, it would make sense that nobody would want to save this data, so I can understand this behavior. But I would like to save whatever data is returned to me, even if it may not be what I'm expecting. I'm receiving the same results for a 403 Forbidden response, and this is probably the case for other ones as well. The 401 outputs Authorization failed. and the 403 outputs xx:xx:xx ERROR 403: Forbidden. after execution. Would this be considered a bug, or is this just an undocumented feature? If it's not considered a bug, could a command line option be added that saves the data from 4xx error code responses rather than just quitting? Basically, if the connection is successful, and something is returned, I want to keep it, no matter what it is. hi chris, wget currently does not save error messages. i am not sure if such a feature would be actually useful for our users, and i am not very keen on adding another very-rarely-used feature to wget. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Bug in 1.10.2 vs 1.9.1
Juhana Sadeharju wrote: Hello. Wget 1.10.2 has the following bug compared to version 1.9.1. First, the bin/wgetdir is defined as wget -p -E -k --proxy=off -e robots=off --passive-ftp -o zlogwget`date +%Y%m%d%H%M%S` -r -l 0 -np -U Mozilla --tries=50 --waitretry=10 $@ The download command is wgetdir http://udn.epicgames.com Version 1.9.1 result: download ok Version 1.10.2 result: only udn.epicgames.com/Main/WebHome downloaded and other converted urls are of the form http://udn.epicgames.com/../Two/WebHome hi juhana, could you please try the current version of wget from our subversion repository: http://www.gnu.org/software/wget/wgetdev.html#development ? this bug should be fixed in the new code. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget problem
[EMAIL PROTECTED] wrote: Dear Mauro, Yes we have installed those prerequsite package but still failed. We have tried the PA-RISC depot and it works although we are using Itanium platform. We have tried another development machine and the result is the same. So I suspect the depot information should be incorrect. hi cheng, do you have a compiler on your machine? maybe you should just try to install wget from sources. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Wget in 1.11 beta 1 found
[EMAIL PROTECTED] wrote: Hi, dear developers! When using -P or --directory-prefix in v1.11 Beta 1 and later v1.11 Beta 1(with spider patch) command-line switches wget does not pay attention to neither of them. It saves files in the current directory. Such incorrent behaviour appeares only if server http answer contains Content-disposition tag. Wget v1.10.2 worked right. Hope, this bug won't live long :). hi denis, could you please try the current wget version by downloading sources from our svn repository: http://www.gnu.org/software/wget/wgetdev.html#development ? i've just commited a patch that should fix this problem: http://article.gmane.org/gmane.comp.web.wget.patches/1925 thanks, mauro -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: read file name from HTTP Header
Rares Vernica ha scritto: Hi, Is is possible for wget to read from the HTTP Headers the name of the file in which to write the output? For example: wget -d http://something.com/A ... ---response begin--- HTTP/1.0 200 OK ... Content-Type: something/something; name=B Content-disposition: attachment; filename=B ... ---response end--- 200 OK It will save the downloaded content into file A. I would prefer that the content is saved in file B as specified in Content-Type or Content-disposition. the current version version of wget has Content-disposition support. you might want to try it: http://www.gnu.org/software/wget/wgetdev.html#development -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: -P ignored by parse_content_disposition
Ashley Bone ha scritto: When wget determines the local filename from Content-Disposition, the -P (--directory-prefix) is ignored. The file is always downloaded to the current directory. Looking at parse_content_disposition(), I think this may be by design. Does anyone know for sure? no, it's clearly a bug. If not, I can submit a patch. yes, please do it if you can. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Feature suggestion: change detection for wget -c
John McCabe-Dansted ha scritto: Wget has no way of verifying that the local file is really a valid prefix of the remote file Couldn't wget redownload the last 4 bytes (or so) of the file? For a few bytes per file we could detect changes to almost all compressed files and the majority of uncompressed files. reliable detection of changes in the resource to be downloaded would be a very interesting feature. but do you really think that checking the last X ( 100) bytes would be enough to be reasonably sure the resource was (not) modified? what about resources which are updated by appending information, such as log files? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: timestamp and backup
Olav Mørkrid ha scritto: hi let's say i fetch 10 files from a server with wget. then i want to download any modifications to these files. HOWEVER, if a new version of a file is downloaded, i want a backup of the old file (eg. write to filename.bak, or possibly filename.001 and .002 to keep a record of all versions of a file. can wget do this? yes. if file X is already present in your filesystem, by default wget downloads the new file and saves it as X.1. i tried to combine -N with -nc, which would seem logical (do timestamp checking, and prevent overwriting), but wget protests that they are mutually exclusive. and if i use no options, then wget fetches a new file even though it's not updated. you should not use -nc, just -N. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.11 beta 1 released
Oliver Schulze L. ha scritto: Does this version have the conection cache code? no, not yet. i have some preliminary code for connection caching, but i am not going to finish it and merge it into the trunk before wget 1.11 is released. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: REST - error for files bigger than 4GB
Steven M. Schweda ha scritto: Are you certain that the FTP _server_ can handle file offsets greater than 4GB in the REST command? i agree with steven here. it's very likely to be a server-side problem. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: one more thing.
Tate Mitchell ha scritto: If anyone could show me how to do this on the wget gui, that would be appreciated, too. http://www.jensroesner.de/wgetgui/ wget and wgetgui are releated programs, but they are developed by two different teams. you should ask this question to the wgetgui authors. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: help downloading site
Tate Mitchell ha scritto: Would it be possible to download each lesson individually, so that as lessons are added, or finished, I can download them w/out re-downloading the whole site? Could someone tell me how please? Or would it be possible to download the whole thing and just re-download parts that have been added since the previous download? why don't you try something like: wget -m -k -np http://www.ncsu.edu/project/hindi_lessons/Hindi.Less.01/index.html -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget: ignores Content-Disposition header
Jochen Roderburg ha scritto: Noèl Köthe schrieb: Hello, I can reproduce the following with 1.10.2 and 1.11.beta1: Wget ignores Content-Disposition header described in RFC 2616, 19.5.1 Content-Disposition. an example URL is: http://bugs.debian.org/cgi-bin/bugreport.cgi/%252Ftmp%252Fupdate-grub.patch?bug=168715;msg=5;att=1 Sorry, I don't see any Content-Disposition header in this example URL ;-) Result of a HEAD request: 200 OK Connection: close Date: Fri, 15 Sep 2006 12:58:14 GMT Server: Apache/1.3.33 (Debian GNU/Linux) Content-Type: text/html; charset=utf-8 Last-Modified: Mon, 04 Aug 2003 21:18:10 GMT Client-Date: Fri, 15 Sep 2006 12:58:14 GMT Client-Response-Num: 1 My own experience is that the 1.11 alpha/beta versions (where this feature was introduced) worked fine with the examples I encountered. Jochen is right: [EMAIL PROTECTED]:~/tmp$ LANG=C ~/code/svn/wget/src/wget -S -d http://bugs.debian.org/cgi-bin/bugreport.cgi/%252Ftmp%252Fupdate-grub.patch?bug=168715;msg=5;att=1 DEBUG output created by Wget 1.10+devel on linux-gnu. --16:58:52-- http://bugs.debian.org/cgi-bin/bugreport.cgi/%252Ftmp%252Fupdate-grub.patch?bug=168715 Resolving bugs.debian.org... 140.211.166.43 Caching bugs.debian.org = 140.211.166.43 Connecting to bugs.debian.org|140.211.166.43|:80... connected. Created socket 3. Releasing 0x00556550 (new refcount 1). ---request begin--- GET /cgi-bin/bugreport.cgi/%252Ftmp%252Fupdate-grub.patch?bug=168715 HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: bugs.debian.org Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.0 200 OK Date: Fri, 15 Sep 2006 14:54:55 GMT Content-Type: text/html; charset=utf-8 Server: Apache/1.3.33 (Debian GNU/Linux) Via: 1.1 proxy (NetCache NetApp/5.6.2R1) ---response end--- HTTP/1.0 200 OK Date: Fri, 15 Sep 2006 14:54:55 GMT Content-Type: text/html; charset=utf-8 Server: Apache/1.3.33 (Debian GNU/Linux) Via: 1.1 proxy (NetCache NetApp/5.6.2R1) Length: unspecified [text/html] Saving to: `%2Ftmp%2Fupdate-grub.patch?bug=168715' [= ] 20,018 32.6K/s in 0.6s Closed fd 3 16:58:54 (32.6 KB/s) - `%2Ftmp%2Fupdate-grub.patch?bug=168715' saved [20018] -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: --html-extension and --convert-links don't work together
Ryan Barrett ha scritto: hi wget developers! nicolas mizel reported a bug with --html-extension and --convert-links about a year and a half ago. in a nutshell, --html-extension appends .html to non-html filenames, but --converted-links doesn't use the .html filenames when it converts links. http://www.mail-archive.com/wget@sunsite.dk/msg07688.html he reported it against 1.9.1, but it's still broken in 1.10.2. any chance it could be fixed in the next release? in my opinion, this is a serious bug. we should fix it ASAP. i have a lot on my plate right now, but if it'd help, i could probably whip up a patch in a few weeks or so... that would be great. thanks. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Bug
Reece ha scritto: Found a bug (sort of). When trying to get all the images in the directory below: http://www.netstate.com/states/maps/images/ It gives 403 Forbidden errors for most of the images even after setting the agent string to firefox's, and setting -e robots=off After a packet capture, it appears that the site will give the forbidden error if the Refferer is not exaclty correct. However, since wget actually uses the domain www.netstate.com:80 instead of without the port, it screws it all up. I've been unable to find any way to tell wget not to insert the port in the requesting url and referrer url. Here is the full command I was using: wget -r -l 1 -H -U Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) -e robots=off -d -nh http://www.netstate.com/states/maps/images/ hi reece, that's an interesting bug. i've just added it to my THINGS TO FIX list. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.11 beta 1 released
Noèl Köthe ha scritto: Am Dienstag, den 22.08.2006, 17:00 +0200 schrieb Mauro Tortonesi: Hello, Mauro, i've just released wget 1.11 beta 1: Thanks.:) you're very welcome to try it and report every bug you might encounter. ... /usr/bin/make install DESTDIR=/home/nk/debian/wget/wget-experimental/wget-1.10.2+1.11.beta1/debian/wget make[1]: Entering directory `/home/nk/debian/wget/wget-experimental/wget-1.10.2+1.11.beta1' cd src /usr/bin/make CC='gcc' CPPFLAGS='' DEFS='-DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/etc/wgetrc\ -DLOCALEDIR=\/usr/share/locale\' CFLAGS='-D_FILE_OFFSET_BITS=64 -g -Wall' LDFLAGS='' LIBS='-ldl -lrt -lssl -lcrypto ' DESTDIR='' prefix='/usr' exec_prefix='/usr' bindir='/usr/bin' infodir='/usr/share/info' mandir='/usr/share/man' manext='1' install.bin make[2]: Entering directory `/home/nk/debian/wget/wget-experimental/wget-1.10.2+1.11.beta1/src' ../mkinstalldirs /usr/bin /usr/bin/install -c wget /usr/bin/wget ... I set DESTDIR in line 1 to install it somewhere but in line 3 DESTDIR='' The problem should be fixed by this: --- Makefile.in.orig2006-08-25 19:53:41.0 +0200 +++ Makefile.in 2006-08-25 19:53:55.0 +0200 @@ -77,7 +77,7 @@ # flags passed to recursive makes in subdirectories MAKEDEFS = CC='$(CC)' CPPFLAGS='$(CPPFLAGS)' DEFS='$(DEFS)' \ CFLAGS='$(CFLAGS)' LDFLAGS='$(LDFLAGS)' LIBS='$(LIBS)' \ -DESTDIR='$(DESTDIR=)' prefix='$(prefix)' exec_prefix='$(exec_prefix)' \ +DESTDIR='$(DESTDIR)' prefix='$(prefix)' exec_prefix='$(exec_prefix)' \ bindir='$(bindir)' infodir='$(infodir)' mandir='$(mandir)' \ manext='$(manext)' Fixed, thanks. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't workwith HTTP]
Jochen Roderburg wrote: I have now tested the new wget 1.11 beta1 on my Linux system and the above issue is solved now. The Remote file is newer message now only appears when the local file exists and most of the other logic with time-stamping and file-naming works like expected. excellent. I meanwhile found, however, another new problem with time-stamping, which mainly occurs in connection with a proxy-cache, I will report that in a new thread. Same for a small problem with the SSL configuration. thank you very much for the useful bug reports you keep sending us ;-) -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Failing assertion in Wget 2187
Stefan Melbinger ha scritto: Hello everyone, I'm having troubles with the newest trunk version of wget (revision 2187). Command-line arguments: wget --recursive --spider --no-parent --no-directories --follow-ftp --retr-symlinks --no-verbose --level='2' --span-hosts --domains='www.example.com,a.example.com,b.example.com' --user-agent='Example' --output-file='example.log' 'www.euroskop.cz' Results in: wget: url.c:1934: getchar_from_escaped_string: Assertion `str *str' failed. Aborted Can somebody reproduce this problem? Am I using illegal combinations of arguments? Any ideas? (Worked before the newest patch.) it's really weird. with this command: wget -d --verbose --recursive --spider --no-parent --no-directories --follow-ftp --retr-symlinks --level='2' --span-hosts --user-agent='Mozilla/5.001 (windows; U; NT4.0; en-us) Gecko/25250101' --domains='www.example.com,a.example.com,b.example.com' http://www.euroskop.cz/ i get: ---response begin--- HTTP/1.0 200 OK Date: Mon, 28 Aug 2006 14:35:14 GMT Content-Type: text/html Expires: Mon, 28 Aug 2006 14:35:14 GMT Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Server: Apache/1.3.26 (Unix) Debian GNU/Linux CSacek/2.1.9 PHP/4.1.2 X-Powered-By: PHP/4.1.2 Pragma: no-cache Set-Cookie: PHPSESSID=b8af8e220f5f1f7321b86ce0524f88b2; expires=Tue, 29-Aug-06 14:35:14 GMT; path=/ Via: 1.1 proxy (NetCache NetApp/5.6.2R1) ---response end--- 200 OK Stored cookie www.euroskop.cz -1 (ANY) / permanent insecure [expiry 2006-08-29 16:35:14] PHPSESSID b8af8e220f5f1f7321b86ce0524f88b2 Length: unspecified [text/html] Closed fd 3 200 OK index.html: No such file or directory FINISHED --16:37:42-- Downloaded: 0 bytes in 0 files it seems there is a weird interaction between cookies and the recursive spider algorithm that makes wget bail out. i'll have to investigate this. PS: Just FYI, when I compile I get the following warnings: http.c: In function `http_loop': http.c:2425: warning: implicit declaration of function `nonexisting_url' main.c: In function `main': main.c:1009: warning: implicit declaration of function `print_broken_links' recur.c: In function `retrieve_tree': recur.c:279: warning: implicit declaration of function `visited_url' fixed, thanks. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.11 beta 1 released
Christopher G. Lewis ha scritto: I've updated the Windows binaries to include Beta 1, and included a binary with Beta 1 + today's patch 2186 2187 for spider recursive mode. Available here: http://www.ChristopherLewis.com\wget thank you very much, chris. you're doing an awesome work. And sorry to those who have been having some problems downloading the ZIPs from my site. I had some weird IIS gzip compression issues. we should plan to move the win32 binaries page to wget.sunsite.dk immediately after the 1.11 release. what do you think? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: DNS through proxy with wget
Karr, David ha scritto: Inside our firewall, we can't do simple DNS lookups for hostnames outside of our firewall. However, I can write a Java program that uses commons-httpclient, specifying the proxy credentials, and my URL referencing an external host name will connect to that host perfectly fine, obviously resolving the DNS name under the covers. If I then use wget to do a similar request, even if I specify the proxy credentials, it fails to find the host. If I instead plug in the IP address instead of the hostname, it works fine. I noticed that the command-line options for wget allow me to specify the proxy user and password, but they don't have a way to specify the proxy host and port. right. you have to specify the hostname/IP address and port of your proxy in your .wgetrc, or by means of the -e option: wget -e 'http_proxy = http://yourproxy:8080/' --proxy-user=user --proxy-password=password -Y on http://someurl.com -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Failing assertion in Wget 2187
Stefan Melbinger ha scritto: By the way, as you might have noticed I wanted to exchange the real domain names with example.com, but forgot to exchange the last argument. :) --domains='www.example.com,a.example.com,b.example.com' --user-agent='Example' --output-file='example.log' 'www.euroskop.cz' So, just for the record, the real --domains value was 'www.euroskop.cz,www2.euroskop.cz,rozcestnik.euroskop.cz'. thanks. In this case, that doesn't change the output, tho. right. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
wget 1.11 beta 1 released
hi to everybody, i've just released wget 1.11 beta 1: ftp://alpha.gnu.org/pub/pub/gnu/wget/wget-1.11-beta-1.tar.gz you're very welcome to try it and report every bug you might encounter. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't workwith HTTP]
Jochen Roderburg ha scritto: Zitat von Jochen Roderburg [EMAIL PROTECTED]: Zitat von Hrvoje Niksic [EMAIL PROTECTED]: Mauro, you will need to look at this one. Part of the problem is that Wget decides to save to index.html.1 although -c is in use. That is solved with the patch attached below. But the other part is that hstat.local_file is a NULL pointer when stat(hstat.local_file, st) is used to determine whether the file already exists in the -c case. That seems to be a result of your changes to the code -- previously, hstat.local_file would get initialied in http_loop. This looks as if if could also be the cause for the problems which I reported some weeks ago for the timestamping mode (http://www.mail-archive.com/wget@sunsite.dk/msg09083.html) Hello Mauro, The timestamping issues I reported in above mentioned message are now also repaired by the patch you mailed last week here. Only the small *cosmetic* issue remains that it *always* says: Remote file is newer, retrieving. even if there is no local file yet. hi jochen, i have been working on the problem you reported for the last couple of days. i've just committed a patch that should fix it for good. could you please try the new HTTP code and tell me if it works properly? thank you very much for your help. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Wget 1.10.2 hangs
Jonathan Abrahams ha scritto: Any idea why this happens? hi jonathan, unfortunately i don't have a working cygwin environment at this time, so i won't be able to find out by myself what's the problem. maybe you can provide us some debug output by turning on the -d command-line option? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: create single directory
Kim Grillo ha scritto: Does anyone know if its possible to create a single directory with the URL name instead of a directory tree when using wget? For example, I dont want to have to move through each directory to get to the file, I'd like the file to be in a folder under a directory named after the URL. I also dont want to do a recursive wget. you'll have to use a shell script to do that. for instance, something like this might work: #!/bin/sh for i do dirname=`echo $i | tr \ _` mkdir $i cd $i wget -nd $i cd .. done -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Exit code
Gerard Seibert ha scritto: I wrote a script that downloads new 'dat' files for my AV program. I am using the '-N' option to only download a newer version of the file. What I need is for 'wget' to issue an exit code which would indicate whether a newer file was downloaded or not. Presently I have the script comparing the time of the existing file and then the time of the file after 'wget' has finished running. It would be simpler if 'wget' simply issued an exit code. I have tried various methods but have not been successful in capturing one if it does actually issue it. Perhaps someone might have some further information on this? hi gerard, unfortunately at the moment wget does not define a specific list of exit values according to program exit states. that's a major problem we'll have to fix in the next 1.12 release. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't workwith HTTP]
Hrvoje Niksic ha scritto: Noèl Köthe [EMAIL PROTECTED] writes: a wget -c problem report with the 1.11 alpha 1 version (http://bugs.debian.org/378691): I can reproduce the problem. If I have already 1 MB downloaded wget -c doesn't continue. Instead it starts to download again: Mauro, you will need to look at this one. Part of the problem is that Wget decides to save to index.html.1 although -c is in use. That is solved with the patch attached below. But the other part is that hstat.local_file is a NULL pointer when stat(hstat.local_file, st) is used to determine whether the file already exists in the -c case. That seems to be a result of your changes to the code -- previously, hstat.local_file would get initialied in http_loop. The partial patch follows: Index: src/http.c === --- src/http.c (revision 2178) +++ src/http.c (working copy) @@ -1762,7 +1762,7 @@ return RETROK; } - else + else if (!ALLOW_CLOBBER) { char *unique = unique_name (hs-local_file, true); if (unique != hs-local_file) you're right, of course. the patch included in attachment should fix the problem. since the new HTTP code supports Content-Disposition and delays the decision of the destination filename until it receives the response header, the best solution i could find to make -c work is to send a HEAD request to determine the actual destination filename before resuming download if -c is given. please, let me know what you think. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it Index: http.c === --- http.c (revisione 2178) +++ http.c (copia locale) @@ -1762,7 +1762,7 @@ return RETROK; } - else + else if (!ALLOW_CLOBBER) { char *unique = unique_name (hs-local_file, true); if (unique != hs-local_file) @@ -2231,6 +2231,7 @@ { int count; bool got_head = false; /* used for time-stamping */ + bool got_name = false; char *tms; const char *tmrate; uerr_t err, ret = TRYLIMEXC; @@ -2264,7 +2265,10 @@ hstat.referer = referer; if (opt.output_document) +{ hstat.local_file = xstrdup (opt.output_document); + got_name = true; +} /* Reset the counter. */ count = 0; @@ -2309,13 +2313,16 @@ /* Default document type is empty. However, if spider mode is on or time-stamping is employed, HEAD_ONLY commands is encoded within *dt. */ - if ((opt.spider !opt.recursive) || (opt.timestamping !got_head)) + if ((opt.spider !opt.recursive) + || (opt.timestamping !got_head) + || (opt.always_rest !got_name)) *dt |= HEAD_ONLY; else *dt = ~HEAD_ONLY; /* Decide whether or not to restart. */ if (opt.always_rest + got_name stat (hstat.local_file, st) == 0 S_ISREG (st.st_mode)) /* When -c is used, continue from on-disk size. (Can't use @@ -2484,6 +2491,12 @@ continue; } + if (opt.always_rest !got_name) +{ + got_name = true; + continue; +} + if ((tmr != (time_t) (-1)) (!opt.spider || opt.recursive) ((hstat.len == hstat.contlen) || Index: ChangeLog === --- ChangeLog (revisione 2178) +++ ChangeLog (copia locale) @@ -1,3 +1,9 @@ +2006-08-16 Mauro Tortonesi [EMAIL PROTECTED] + + * http.c: Fixed bug which broke --continue feature. Now if -c is + given, http_loop sends a HEAD request to find out the destination + filename before resuming download. + 2006-08-08 Hrvoje Niksic [EMAIL PROTECTED] * utils.c (datetime_str): Avoid code repetition with time_str.
Re: wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't workwith HTTP]
Hrvoje Niksic ha scritto: Mauro Tortonesi [EMAIL PROTECTED] writes: you're right, of course. the patch included in attachment should fix the problem. since the new HTTP code supports Content-Disposition and delays the decision of the destination filename until it receives the response header, the best solution i could find to make -c work is to send a HEAD request to determine the actual destination filename before resuming download if -c is given. please, let me know what you think. I don't like the additional HEAD request, but I can't think of a better solution. same for me. in order to avoid the overhead of the extra HEAD request, i had considered disabling Content-Disposition and using url_file_name to determine the destination filename in case -c is given. but i really didn't like that solution. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: RES: BUG
Junior + Suporte ha scritto: Dear Mauro, Follow the -S output for my command... this user and password is just a test account, no problems with obfuscation... C:\Documents and Settings\Luiz Carlos\Desktopwget -S http://www.tramauniversit ario.com.br/tuv2/participe/login.jsp?rd=http://www.tramauniversitario.com.br /tuv 2/enquete/cb/sul/arte.jsp[EMAIL PROTECTED]pass=123qweSubmit .x=6 Submit.y=1 --12:06:46-- http://www.tramauniversitario.com.br/tuv2/participe/login.jsp?rd=h ttp://www.tramauniversitario.com.br/tuv2/enquete/cb/sul/arte.jspusername=80 2400 [EMAIL PROTECTED]pass=123qweSubmit.x=6Submit.y=1 = [EMAIL PROTECTED] enquete%2Fcb%2Fsul%2Farte.jsp[EMAIL PROTECTED]pass=123qweSu bmit .x=6Submit.y=1' Resolving www.tramauniversitario.com.br... 200.177.252.35, 200.177.252.36 Connecting to www.tramauniversitario.com.br|200.177.252.35|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 302 Moved Temporarily Date: Tue, 11 Jul 2006 15:06:48 GMT Server: Apache/2.0.54 (Unix) mod_jk/1.2.14 X-Powered-By: JSP/2.0 Set-Cookie: JSESSIONID=F620EF2BED01FE4FD3900E05DB5A2B24; Path=/tuv2 Set-Cookie: tu=661541|[EMAIL PROTECTED]; Expires=Fri, 22-Oct-2055 15:06:4 8 GMT; Path= Location: http://www.tramauniversitario.com.br/servlet/login.jsp?username=8024 00391%40terra.com.brpass=123qwerd=http%3A%2F%2Fwww.tramauniversitario.com. br%2 Ftuv2%2Fenquete%2Fcb%2Fsul%2Farte.jsp Content-Length: 0 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: text/html;charset=ISO-8859-1 Error in Set-Cookie, field `Path'Syntax error in Set-Cookie: tu=661541|802400391 @TERRA.COM.BR; Expires=Fri, 22-Oct-2055 15:06:48 GMT; Path= at position 78. Location: http://www.tramauniversitario.com.br/servlet/login.jsp?username=802400 391%40terra.com.brpass=123qwerd=http%3A%2F%2Fwww.tramauniversitario.com.br %2Ft uv2%2Fenquete%2Fcb%2Fsul%2Farte.jsp [following] --12:06:47-- http://www.tramauniversitario.com.br/servlet/login.jsp?username=80 2400391%40terra.com.brpass=123qwerd=http%3A%2F%2Fwww.tramauniversitario.co m.br %2Ftuv2%2Fenquete%2Fcb%2Fsul%2Farte.jsp = [EMAIL PROTECTED]@terra.com.brpass=123qwerd=http%3A% 2F%2Fwww.tramauniversitario.com.br%2Ftuv2%2Fenquete%2Fcb%2Fsul%2Farte.jsp' Reusing existing connection to www.tramauniversitario.com.br:80. HTTP request sent, awaiting response... HTTP/1.1 302 Moved Temporarily Date: Tue, 11 Jul 2006 15:06:48 GMT Server: Apache/2.0.54 (Unix) mod_jk/1.2.14 X-Powered-By: JSP/2.0 Set-Cookie: JSESSIONID=F52D0A41E21B23C4CAE45AD3461A5817; Path=/servlet Location: http://www.tramauniversitario.com.br/tuv2/enquete/cb/sul/arte.jsp Content-Length: 0 Keep-Alive: timeout=15, max=99 Connection: Keep-Alive Content-Type: text/html;charset=ISO-8859-1 Location: http://www.tramauniversitario.com.br/tuv2/enquete/cb/sul/arte.jsp [fol lowing] --12:06:48-- http://www.tramauniversitario.com.br/tuv2/enquete/cb/sul/arte.jsp = `arte.jsp' Reusing existing connection to www.tramauniversitario.com.br:80. HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Tue, 11 Jul 2006 15:06:49 GMT Server: Apache/2.0.54 (Unix) mod_jk/1.2.14 X-Powered-By: JSP/2.0 Set-Cookie: JSESSIONID=1E3B1DF7F0C37BCDA33995A5E39AD0C4; Path=/tuv2 Connection: close Content-Type: text/html;charset=ISO-8859-1 Length: unspecified [text/html] [ = ] 3,416 --.--K/s 12:06:49 (47.32 MB/s) - `arte.jsp' saved [3416] Luiz Carlos Zancanella Junior -Mensagem original- De: Mauro Tortonesi [mailto:[EMAIL PROTECTED] Enviada em: segunda-feira, 10 de julho de 2006 07:04 Para: Tony Lewis Cc: 'Junior + Suporte'; [EMAIL PROTECTED] Assunto: Re: BUG Tony Lewis ha scritto: Run the command with -d and post the output here. in this case, -S can provide more useful information than -d. be careful to obfuscate passwords, though!!! hi junior, unfortunately i can't reproduce the bug. here's what i get: [EMAIL PROTECTED]:~/tmp/wgettest$ wget 'http://www.tramauniversitario.com.br/tuv2/participe/login.jsp?rd=http://www.tramauniversitario.com.br/tuv2/enquete/cb/sul/arte.jsp[EMAIL PROTECTED]pass=123qweSubmit.x=6Submit.y=1' --15:56:34-- http://www.tramauniversitario.com.br/tuv2/participe/login.jsp?rd=http://www.tramauniversitario.com.br/tuv2/enquete/cb/sul/arte.jsp[EMAIL PROTECTED]pass=123qweSubmit.x=6Submit.y=1 = `login.jsp?rd=http:%2F%2Fwww.tramauniversitario.com.br%2Ftuv2%2Fenquete%2Fcb%2Fsul%2Farte.jsp[EMAIL PROTECTED]pass=123qweSubmit.x=6Submit.y=1' Risoluzione di www.tramauniversitario.com.br in corso... 200.177.252.35, 200.177.252.36 Connessione a www.tramauniversitario.com.br|200.177.252.35:80... connesso. HTTP richiesta inviata, aspetto la risposta... 404 /participe/login.jsp 15:56:35 ERRORE 404: /participe/login.jsp. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http
concurrent use of -O and -N options
as some of you have noticed, i've recently disabled the use of -N in combination with -O in soon-to-be-released wget 1.11. in fact, not only concurrent use -O and -N has been broken since the dawn of time, but i believe it breaks the principle of least suprise and i don't think it is widely used. let me clarify once again that the semantics of -O are intentionally similar to a unix shell output redirection. they were not meant to specify a custom naming pattern for downloaded resources (future versions of wget will likely have a dedicated command-line option for this). in this context, i believe that allowing -N to be used w/ -O could be very confusing. Louis Gosselin (included in CC) asked me to reconsider my decision, as he believes the concurrent use of -O and -N options is actually very helpful. so, before i cross the point of no return and deprecate concurrent use of -O and -N, i would like to hear your opinions. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't workwith HTTP]
Hrvoje Niksic wrote: Noèl Köthe [EMAIL PROTECTED] writes: a wget -c problem report with the 1.11 alpha 1 version (http://bugs.debian.org/378691): I can reproduce the problem. If I have already 1 MB downloaded wget -c doesn't continue. Instead it starts to download again: Mauro, you will need to look at this one. i surely will. unfortunately, at the moment i am attending the winsys 2006 research conference: http://www.winsys.org i'll take a look at the problem as soon as i get back to italy. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: WGet -O and -N timestamp options don't work together
Louis Gosselin wrote: WGet version 1.8.1 wget -N http://host/file.html WGet only downloads and overwrites the ./file.html if the local file is older than the http copy. As expected. wget -O localfile.html -N http://host/remotefile.html WGet will overwrite ./localfile.html if ./remotefile.html does not exist or is older than http://host/remotefile.html. The expected behavior would be that ./localfile.html would be checked for timestamps instead of ./remotefile.html. This is breaking my scripts because ./remotefile.html does not (and should not) exist, resulting in the file always downloading. The workaround for now is: 1. Move or copy the ./localfile.html to ./tmp/remotefile.html 2. wget without the -O 3. Move or copy the ./tmp/remotefile.html to ./localfile hi louis, -O and -N were never meant to work together. in fact, the upcoming 1.11 release of wget will forbid the use of -N if -O is given. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget alpha: -r --spider, number of broken links
Stefan Melbinger ha scritto: I don't think that non-existing robots.txt-files should be reported as broken links (as long as they are not referenced by some page). Current output, if spanning over 2 hosts (e.g., -D www.domain1.com,www.domain2.com): - Found 2 broken links. http://www.domain1.com/robots.txt referred by: (null) http://www.domain2.com/robots.txt referred by: (null) - What do you think? hi stefan, of course you're right. but you are also late ;-) in fact, this bug is already fixed in the current version of wget, which you can retrieve from our source code repository: http://www.gnu.org/software/wget/wgetdev.html#development thank you very much for your report anyway. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget alpha: -r --spider downloading all files
Stefan Melbinger ha scritto: Hi, As you might have noticed I was trying to use wget as a tool to check for dead links on big websites. The combination of -r and --spider is working in the new alpha version, however wget is still downloading ALL files (no matter if they are parseable for further links or not), instead of just getting the status response for files other than text/html or application/xhtml+xml. I don't think that this makes very much sense; the files are deleted anyway and downloading a 300MB video is not useful if you just want to check links and see whether the video is there at all. Could somebody suggest a quick hack to disable the downloading of non-parseable documents? I think it must be somewhere in the area of http.c, somewhere around gethttp() or maybe http_loop() - unfortunately, my knowledge of C and my knowledge of this project weren't enough to get any satisfying result. you're absolutely right, stefan. i've just started working on it. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget alpha: -r --spider downloading all files
Stefan Melbinger ha scritto: By the way, FTP transfers shouldn't be downloaded as a whole, too, in this mode. well, the semantics of --spider for FTP are still not very clear to me. at the moment, i was considering whether to simply perform FTP listing in case --spider is given, or to disable --spider for FTP URLs. what do you think? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Wget Win32 Visual Studio project
Christopher G. Lewis ha scritto: Hi everyone - I've uploaded a working Visual Studio project file for the current TRUNK in subversion. excellent. thank you very much, chris. I'm pretty sure this is the 1.11 Alpha branch. yes, it is. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Cannot retrieve files from ftp site.
Spiros Melitsopoulos wrote: Hi all I am a newbie on wget, so there is a silly question about its usage. (If this list is not the place for such questions, please let me know in order to avoid using it for this kind of stuff.) I try to dowlnload a directory with its contents from an ftp site via proxy. Although the connection is properly made, what i finaly get is the listing of the contents of the directory in an .html page, but none of its actual contents are downloaded. what i used is wget -r -np -l 10 -w 10 --follow-ftp while .wgetrc contains the proxy settings and the following: reclevel = 15 waitretry = 10 use_proxy = on dirstruct = on recursive = on follow_ftp = on glob = on verbose = on mirror = on retr_symlinks = on Can anybody contribute with a hint or advice? I will be gratefull! hi spiros, recursive FTP through proxy has been broken for ages. fortunately, this bug was fixed in the recently released 1.11-alpha1 version of wget: http://www.mail-archive.com/wget@sunsite.dk/msg09071.html -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Bug in wget 1.10.2 makefile
Daniel Richard G. ha scritto: Hello, The MAKEDEFS value in the top-level Makefile.in also needs to include DESTDIR='$(DESTDIR)'. fixed, thanks. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Documentation (manpage) bug
Linda Walsh ha scritto: FYI: On the manpage, where it talks about no-proxy, the manpage says: --no-proxy Don't use proxies, even if the appropriate *_proxy environment variable is defined. For more information about the use of proxies with Wget, ^ -Q quota Note -- the sentence referring to more information about the use of proxies stops in the middle of saying anything and starts with -Q quota. fixed, thanks. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.11 alpha1 - content disposition filename
Jochen Roderburg ha scritto: Hi, I was happy to see that a long missed future was now implemented in this alpha, namely the interpretaion of the filename in the content dispostion header. Just recently I had hacked a little script together to achieve this, when I wanted to download a greater number of files where this was used ;-) I had a few cases, however, which did not come out as expected, but I think the error is this time in the sending web application and not in wget. E.g, a file which was supposed to have the name BW.txt came with the header: Content-Disposition: attachment; filename=Bamp;W.txt; the error is definitely in the web application. the correct header would be: Content-Disposition: attachment; filename=BW.txt; All programs I tried (the new wget and several browsers and my own script ;-) seemed to stop parsing at the first semicolon and produced the filename Bamp. Any thoughts ?? i think that the filename parsing heuristics currently implemented in wget are fine. you really can't do much better in this case. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Wishlist: support the file:/// protocol
David wrote: In replies to the post requesting support of the “file://” scheme, requests were made for someone to provide a compelling reason to want to do this. Perhaps the following is such a reason. hi david, thank you for your interesting example. support for “file://” scheme will be very likely introduced in wget 1.12. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: login incorrect
Hrvoje Niksic ha scritto: Gisle Vanem [EMAIL PROTECTED] writes: Kinda misleading that wget prints login incorrect here. Why couldn't it just print the 530 message? You're completely right. It was an ancient design decision made by me when I wasn't thinking enough (or was thinking the wrong thing). hrvoje, are you suggesting to extend ftp_login in order to return both an error code and an error message? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Using --spider to check for dead links?
Stefan Melbinger ha scritto: Hello, I need to check whole websites for dead links, with output easy to parse for lists of dead links, statistics, etc... Does anybody have experience with that problem or has maybe used the --spider mode for this before (as suggested by some pages)? If this should work, all HTML pages would have to be parsed completely, while pictures and other files should only be HEAD-checked for existence (in order to save bandwidth)... Using --spider and --spider -r was not the right way to do this, I fear. Any help is appreciated, thanks in advance! hi stefan, historically, wget never really supported recursive --spider mode. fortunately, this has been fixed in 1.11-alpha-1: http://www.mail-archive.com/wget@sunsite.dk/msg09071.html so, it will be included in the upcoming 1.11 release. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Suggestion
Kumar Varanasi ha scritto: Hello there, I am using WGET in my system to download http files. I see that there is no option to download the file faster with multiple connections to the server. Are you planning on a multi-threaded version of WGET to make downloads much faster? no, there is no plan to implement parallel download at the moment. however, please notice that it is highly unlikely that opening more than one connection with the same server will speed up the download process. parallel download makes sense only when more than one server is involved. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: mirror mode does not handle ../ properly
Stefan Powell wrote: In mirror mode (-m) accessing a page with relative links to parent directories ( http://example.com/somepath/../somefile.html ) the two dots are URL encoded. The correct behavior is specified in section 5.2.4 of RFC3986. (http://www.gbiv.com/protocols/uri/rfc/rfc3986.html#relative-dot-segments) hi stefan, which version of wget are you using? 1.11-alpha1 should have fixed that problem. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Wget
John McGill ha scritto: Is there a way of telling wget to download the image and increment the file number wget does that automatically. suppose that you're using: wget http://yoyodine.com/somepath/somefilename.txt if another file named somefilename.txt is present in the current directory, the new file is named somefilename.txt.1. if you call wget again, the next file will be named somefilename.txt.2, and so on. or add the date/time stamp? you have to use the -N option for this. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Wget
Post, Mark K ha scritto: You would want to use the -O option, and write a script to create a unique file name to be passed to wget. yes, something like this: UNIQUE_FILENAME=`mktemp` wget http://someserver.com/somepath/somefile.txt -O $UNIQUE_FILENAME would probably work. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: BUG
Tony Lewis ha scritto: Run the command with -d and post the output here. in this case, -S can provide more useful information than -d. be careful to obfuscate passwords, though!!! -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget build/debugging!! -debugger tool
bruce ha scritto: when you guys are building/testing wget, are you ever using any kind of IDE? no, i only use vim: http://www.vim.org and while i can get it to build using Eclipse on my linux box, i cant' seem to figure out what i need to do within the settings to actually be able to step into various functions once i'm in the main () function. and if you can't step into/through functions.. debugging gets to be a pain!!! not really. you can use gdb from command line: http://www.gnu.org/software/gdb/ or a GUI front-end to GDB like insight: http://sources.redhat.com/insight/ or ddd: http://www.gnu.org/software/ddd/ but for network programming i have always found brian kernighan's approach (a well-placed printf is the best debugger) to be invaluable. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Windows compler need (was RE: wget 1.11 alpha 1 released)
Christopher G. Lewis ha scritto: OK, the Win32 compile is working, I've got both the SVN Trunk and the 1.11 alpha branch from ftp://alpha.gnu.org/pub/pub/gnu/wget/wget-1.11-alpha-1.tar.gz . We'll obviously work through the warnings that are coming up, and re-address the CL parameters to fit with the VS 2005 C Compiler. excellent. I think we should make this the default supported compiler for the 1.11 release if we can confirm that we compile with VC++ Express (which is free from MS). i agree. MSVC 14 (AKA Visual Studio 2005's C Compiler) should be the default supported compiler for the future Wget releases. fortunately, i have been able to setup a build environment w/ MSVC 14 on my laptop so from now on i'll be able to help w/ Win32-related problems. We should also double check on OpenSSL, since MS has now includes MASM as a free download for VC++ Express users. I'm going to have to spin up a VM to test the VC++ Express compile - I should be able to do this sometime this weekend. does VC++ Express provide also nmake? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget - tracking urls/web crawling
Tony Lewis wrote: Bruce wrote: any idea as to who's working on this feature? Mauro Tortonesi sent out a request for comments to the mailing list on March 29. I don't know whether he has started working on the feature or not. yes. i haven't started coding it yet, though. i am still working on the last fixes for recursive spider mode. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: License of wget.texi: suggest removal of invariant sections
Noèl Köthe wrote: Am Montag, den 12.06.2006, 15:17 -0700 schrieb Don Armstrong: Hello Hrvoje and Mauro, I understand and agree with the reasoning behind removing the GPL as the invariant section; but why also remove the GFDL as an invariant section? That's just because having the GFDL as an invariant section is a null op; the GFDL itself already requires that it be included, and no one can change it anyway (save from going to a later version if your copyright statement allows it.) If it were to stay as an invariant section, I don't think it would cause a problem for Debian, but I really don't see any reason from your perspective to do so. I suggested removing them both because I figured if you were to modify it at all, you may as well just modify it once. Thanks for working with us on this issue! I checked 1.11 alpha1 and svn trunk but both are still there. Do you already have decided to remove GFDL from this section, too? yes. i've just talked with hrvoje about it, and we reached consensus on changing both the GPL and the GFDL sections from invariant to normal. Just for your info and not to hurry you: Debian will start freezing in Jul and because wget is an important part I want to have it resolved before.:) i'll do my best to release wget 1.11 ASAP. IIRC, fedora should freeze in July as well. it would be nice to have wget 1.11 included in both the new versions of debian and fedora. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget issues on Solaris
Kommineni, Devendra wrote: If invoked using the DNS alias for the test cluster, it fails after several retries. (we have two servers in the cluster) that's weird. it seems that the TCP connection is correctly established but is dropped after the HTTP request is sent. maybe there's something wrong at the HTTP level. perhaps you could turn on the -S option to examine HTTP requests and responses? The problem does not occur on the linux nodes ( with wget 1.10.1). what do you mean? are you saying that the problem arises only with a specific version of wget (possibly prior to 1.10.1) or with a specific non-linux platform? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
wget 1.11 alpha 1 released
hi to everybody, i've just released wget 1.11 alpha 1: ftp://alpha.gnu.org/pub/pub/gnu/wget/wget-1.11-alpha-1.tar.gz you're very welcome to try it and report every bug you might encounter. with this release, the development cicle for 1.11 officially enters the feature freeze state. wget 1.11 final will be released when all the following tasks are completed: 1) win32 fixes (setlocale, fork) 2) last fixes to -r and --spider 3) update documentation 4) return error/warning if multiple HTTP headers w/ same name are given 5) return error/warning if conflicting options are given 6) fix Saving to: output in case -O is given unfortunately, this means that all the planned major changes (gnunet support, advanced URL filtering w/ regex, etc...) will have to wait until 1.12. however, i think that the many important features and bugfixes recently commited into the trunk more than justify the new, upcoming 1.11 release. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.11 alpha 1 released
Steven M. Schweda wrote: From: Mauro Tortonesi ftp://alpha.gnu.org/pub/pub/gnu/wget/wget-1.11-alpha-1.tar.gz I assume that it would be pointless to look for the VMS changes here, but feel free to amaze me. i promise we'll seriously talk about merging your VMS changes into wget at the beginning of the 1.12 development cycle. you'll be very welcome to convince me about the soundness of your code and the need to merge VMS support into wget via your favorite IM tool: http://www.tortonesi.com/contactme.shtml however, for the moment i have to focus on the 1.11 release. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Wget and proxy (again)
Leonardo wrote: Hi all, I have a PC behind a proxy and I'm not able to wget files, nor ping (althought correct addresses are found : PING www.l.google.com (66.249.85.99)...) , but I can surf if I set the correct proxies settings. I've read the F. wget manual and searched the net, but maybe it's just I don't understand. I do: wget ftp://ftp.gentoo.mesh-solutions.com/gentoo/snapshots/portage-20060525.tar.bz2.md5sum and have the variables: http_proxy=http://www-proxy.physi.uni-heidelberg.de:3128 ftp_proxy=ftp://www-proxy.physi.uni-heidelberg.de:3128 What I get is: ftp://ftp.gentoo.mesh-solutions.com/gentoo/snapshots/portage-20060525.tar.bz2.md5sum = `portage-20060525.tar.bz2.md5sum' Resolving www-proxy.physi.uni-heidelberg.de... 129.206.32.243 Connecting to www-proxy.physi.uni-heidelberg.de|129.206.32.243|:3128... connected. Logging in as anonymous ... Error in server response, closing control connection. Retrying. and so on and on. I tried also passive_ftp = on and/or use_proxy = on on /etc/wgetrc without result. What do I do wrong? hi leonardo, could you please tell us which version of wget you are using and post the output of wget with the -S and -d options turned on? it's impossible to find out what's the problem with the simple output you provided. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Out of Memory Error
[EMAIL PROTECTED] wrote: I ran wget (1.9.1) on Debian GNU/Linux to find out how many links my site had, and after Queue count 66246, maxcount 66247 links, the wget process ran out of memory. Is there a way to set the persistent state to disk instead of memory so that all the system memory and cache is not slowly consumed until the process halts? My site may have 1 M to 2 M links. hi oscar, exactly how much memory does wget take? could you please try if the most recent version of wget (1.10.2) gives you the same problem? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: WGET -O Help
Steven M. Schweda wrote: But the real question is: If a Web page has links to other files, how is Wget supposed to package all that stuff into _one_ file (which _is_ what -O will do), and still make any sense out of it? even more, how is Wget supposed to properly postprocess the saved data, which can as well be a combination of HTML pages and binary files? from my perspective the main problem with -O is that wget users seem not to understand its semantics. -O behaves as an stdio redirection (or a pipeline concatenation in case of wget -O - | someothercommand) in shell, and presents some non-negligible limitations (e.g. in postprocessing of the saved data). -O was never meant to provide a rename saved files after download and postprocessing semantics. perhaps we should make this clear in the manpage and provide an additional option which just renames saved files after download and postprocessing according to a given pattern. IIRC, hrvoje was just suggesting to do this some time ago. what do you guys think? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Exclude directorie
Antoine Bonnefoy wrote: Hy, I found a bug with the option -X in recursive mode. When i use a wildcard, in the exclude string, it's only works for one level string. For example: for this directory architecture : server: =level1 = Data = level2 = Data wget -X */Data -r http://server/level1/ works correctly for exclude directory Data but don't exclude the Data directory in level2 The bug come from the fnmatch function. I correct it for me in the utils.c file with deactivate the flag FNM_PATHNAME in the proclist() function. Is it the right comportment? I hope its help Excuse me for my English hi antoine, could you please tell us which version of wget you are using? after the release of 1.10.2 i have merged a patch that fixed a few bugs in -X support, so you might want to try the current version of wget available from our subversion repository: http://www.gnu.org/software/wget/wgetdev.html#development -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Redirect makes wget fetch another domain
Equipe web wrote: I've come across this annoying bug : Even though wget is told not to span other hosts, it does when redirected !!! This bug has been waiting for a fix for quite a long time : http://www.mail-archive.com/wget@sunsite.dk/msg01675.html I don't know how to make things change as I'm not a programmer myself... hi luc, thank you very much for your bug report. which version of wget are you using? i have recently merged a couple of patches that fixed a few bugs, so you might want to try the current version of wget available from our subversion repository: http://www.gnu.org/software/wget/wgetdev.html#development -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: [Fwd: Bug#366434: wget: Multiple 'Pragma:' headers not supported]
Noèl Köthe wrote: Hello, a forwarded report from http://bugs.debian.org/366434 could this behaviour be added to the doc/manpage? i wonder if it makes sense to add generic support for multiple headers in wget, for instance by extending the --header option like this: wget --header=Pragma: xxx --header=dontoverride,Pragma: xxx2 someurl as an alternative, we could choose to support multiple headers only for a few header types, like Pragma. however, i don't really like this second choise, as it would require to hardcode the above mentioned header names in the wget sources, which IMVHO is a *VERY* bad practice. what do you think? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: -O switch always overwrites output file
Toni Casueps wrote: I use Wget 1.10 for Linux. If I use -O and there was already a file in the current directory with the same name it overwrites it, even if I use -nc. Is this a bug or intentional? IMVHO, this is a bug. if hrvoje does not provide a rationale for this behavior, i will fix it before the release of wget 1.11 (which should be pretty soon). -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wrong exit code
Lars Wilke wrote: Hi, first this is not a real bug and is more like a wishlist item. So the problem: When invoking wget to retrieve a file via ftp all is fine if the file exists and wget is able to retrieve it. The return code from wget is 0. If the file is not found on the server the return code is 1. Good. I expected that wget would behave the same when using file globbing. If the file can be found via a pattern and can be downloaded wget returns with 0. But if the file can not be found after successfully retrieving a directory listing wget returns with 0, too! IMHO here wget should exit with the same error code (1) as above. I searched the docs if this behaviour is mentioned somewhere but have not found it mentioned. Therefor i am sending this email. Sorry if i missed this detail mentioned somewehere. hi lars, unfortunately one of wget's weak points is its lack of consistency for the returned error codes. after the release of wget 1.11, i am planning some major architectural change for wget. that will be the best time to redesign the code which handles returned error values. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Missing K/s rate on download of 13MB file
J. Grant wrote: Hi, On 14/05/06 21:26, Hrvoje Niksic wrote: J. Grant [EMAIL PROTECTED] writes: Could an extra value be added which lists the average rate? average rate: xx.xx K/s ? Unfortunately it would have problems fitting on the line. Perhaps the progress bar would be reduced? i don't think that would be a good idea. or the default changed to be the average rate? i don't think that would be a good idea either. but... or if neither of those are suitable, could a conf file setting be added so we can switch between average rate, and current rate? ...this is an interesting proposal. however, my todo list is already *HUGE* and grows larger every day. so i really doubt i will have time to implement this feature (at least for the next months). you're very welcome to proceed w/ the development of configurable average calculation code and send me a patch, though. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: recursive download
Ajar Taknev wrote: Hi, I am trying to recursively download from an ftp site without success. I am behind a squid proxy and I have setup the .wgetrc correctly. When I do wget -r ftp:/ftp.somesite.com/dir it fetches ftp.somesite.com/dir/index.html and exits. It doesn't do a recursive download. When I do the same thing from a machine which is not behind a proxy the same command does a recursive download. proxy is squid-2.5.STABLE9 and wget version is 1.10.2. Any ideas what the problem could be? recursive FTP retrieval through HTTP proxies has been broken for a long time. i have received a patch that should fix the problem some time ago, but i haven't been able to test it yet. however, this is one of the pending bugs that will be fixed before the upcoming 1.11 release. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: links to follow
Andrea Rimicci wrote: Hi all, I'd like retrieve a web document where some links are coded in javascript calls, so I'd like instruct wget when a something like JSfunc('my/link/to/follow/') is matched, he recognize 'my/link/to/follow/' as a link to follow. Is there any way to accomplish this? Maybe using regexps, to setup which patterns will trigger the link, will be great. TIA, Andrea P.S. dunno if this was already discussed, Ive not found any previous post with 'follow' in subject. hi andrea, wget does not support parsing of javascript code at the moment, nor regexps on downloaded file content. however, we are planning to add support for regexps in wget 1.12, and possibly for external url parsers. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: fixed recursive ftp download over proxy and 1.10.3
[EMAIL PROTECTED] wrote: Hi, I have been embarrassed with the ftp over http bug . for quite a while : 1.5 years. I was very happy to learn that someone had developped a patch. Happier to read that you would merge it shortly. Do you know when you will be able to publish this 1.10.3 release ? 1.10.3 will never be released. the next version of wget will be 1.11, and i hope i will be able to release it by the end of june. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: [Fwd: Bug#366434: wget: Multiple 'Pragma:' headers not suppor ted]
Herold Heiko wrote: From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] i wonder if it makes sense to add generic support for multiple headers in wget, for instance by extending the --header option like this: wget --header=Pragma: xxx --header=dontoverride,Pragma: xxx2 someurl That could be a problem if you need to send a really weird custom header named dontoverride,Pragma. Probability is near nil but with the whole big bad internet waiting maybe separating switches (--header and --header-add) would be better. you're right. in fact, i like hrvoje's --append-header proposal better. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Wget mirror restrict asp variables
MarK wrote: Hi, how can I mirror a set of pages of a site restricting one ore more variable defined in the URL? For example: http://www.thesite.com/page.apsx?f=123 I want to mirror teh site starting from this page and all the linked pages only if they have f=123 I tried wget -m -k -E -A*f=123 http://www.thesite.com/page.aspx?f=123 but this only download that page. Removing the -A option wget download the whole site. unfortunately, at the moment wget does not allow you to restrict the set of downloaded files according to a specific query value. this feature is planned for a next release, though. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget www.openbc.com post-data/cookie problem
Erich Steinboeck wrote: Mauro Tortonesi wrote: this might be a problem with your server. could you please provide us with the output of wget with the -S option turned on? [...] ---response begin--- HTTP/1.1 200 OK Date: Tue, 02 May 2006 15:01:45 GMT Server: Apache Expires: Now Pragma: no-cache Cache-control: private Connection: close Content-Type: text/html; charset=UTF-8 hi eric, as you can see the problem is with the web server, that does not return a cookie (by means of the Set-Cookie header) to wget. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Feature Request: Metalink support (mirror list file)
anthony l. bryan wrote: Hi, I realize this may be out of the scope of wget, so I hope I don't offend anyone. that's a very interesting proposal, actually. but is metalink a widely used format to describe resource availability from multiple URLs? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Rtsp/mms support
Ryan Golhar wrote: I've seen several messages about this, but haven't determined if this will be implemeneted or not Will wget support rtsp (and/or mms)? If not, I'd be intersted in implementing it. i am very interested in adding both rtsp and mms support to wget. however, since this might require significant changes and i am planning a major overhaul of wget's architecture, for the moment i think i will stick to my bugfixing and redesign tasks and leave rtsp/mms support for later. however, you're very welcome to take care of realizing rtsp/mms support for wget. in case you will take this task, please let me know so that we can coordinate and decide together the best design for this improvement. BTW, today i have taken a very brief look at the source code of mplayer. in the libmpdemux directory there is some code that we could borrow to speed up rtsp/mms development. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget www.openbc.com post-data/cookie problem
Erich Steinboeck wrote: Being new to wget (I'm using GNU Wget 1.10.2 for Windows) I'm trying to log into www.openbc.com. It works perfectly with a browser, but I can't get it to work with wget. ... Can anyone help? What am I doing wrong here? Thanks!! this might be a problem with your server. could you please provide us with the output of wget with the -S option turned on? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget: no support for CSS-Background-images
Michael Probst wrote: Hi Folks, thank you for all the work you have been spending on wget! I found a little thing, though: My version (GNU Wget 1.9+cvs-dev) will not support css background- images. Take a look at the exemple I send with. If you put it into a httpdocs- Directory of a web server and try to get it via wget -r http://localhost/_temp_css_wget_problem/ there will be no background image at the end. hi micheal, it's a known problem. wget does not parse css stylesheets or javascript code for urls. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: problems after upgrading to fedora core 5
cliff wrote: Good news for wget. Building from the source worked. So for some reason, either my system is screwed or the binary with FC5 was misbuilt. Seems hard to believe latter but this box was a pretty bare, standard FC3 and was just a straight, easy upgrade to FC5. that's very weird. i've just taken a look at fedora's wget-1.10.2-3.2.1 RPM binary (i suppose that's the version you are using, could you please check that with the rpm -q wget command?) and it does not include any patch that could modify wget's default behaviour. In either case do you know of any settings that would affect this so I might better know what to say/search on the fedora lists? not really. the behaviour you described is very awkward. the only things i can think of is a misconfigured /etc/nsswitch.conf, with the hosts: dns6 line. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget stopped working
Jana Mccoy wrote: wget stopped working after I downloaded the bc functions. hi jana, what exactly are the bc functions you're talking about? are they related to wget in any way? What is an ERROR -1: Malformed status line? it means wget failed to parse the HTTP response returned by the web server. Here's what I'm entering and the reply: $ wget http://www.yahoo.com --21:51:21-- http://www.yahoo.com/ = `index.html' Resolving www.yahoo.com... 68.142.226.42, 68.142.226.48, 68.142.226.33, ... Connecting to www.yahoo.com|68.142.226.42|:80... connected. HTTP request sent, awaiting response... -1 21:51:21 ERROR -1: Malformed status line. could you please tell us which version of wget you are using and send us the result of wget -v -d http://www.yahoo.com? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: error in the french help translation of wget help
nicolas figaro wrote: Hi, there is a mistake in the french translation of wget --help (on linux redhat). in english : wget --help | grep spider --spider don't download anything was translated in french this way : wget --help | grep spider --spider ne pas télécharger n'importe quoi. an english translation could be : don't download anything weird a correct translation could have been : ne rien télécharger ne télécharger aucun fichier but with the recent french law, this message makes wget a very interesting and smart tool. hi nicolas, as wget's development webpage states: http://www.gnu.org/software/wget/wgetdev.html the coordination of translation efforts for GNU tools is done by the Translation Project: http://www.iro.umontreal.ca/translation/ you should contact them (which i included in CC) to report errors in current french translation of wget. thank you very much for your help. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: regex support RFC
Hrvoje Niksic wrote: Tony Lewis [EMAIL PROTECTED] writes: I don't think ,r complicates the command that much. Internally, the only additional work for supporting both globs and regular expressions is a function that converts a glob into a regexp when ,r is not requested. That's a straightforward transformation. ,r makes it harder to input regexps, which are the whole point of introducing --filter. Besides, having two different syntaxes for the same switch, and for no good reason, is not really acceptable, even if the implementation is straightforward. i agree 100%. and don't forget that globs are already supported by current filtering options. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: regex support RFC
Curtis Hatter wrote: On Friday 31 March 2006 06:52, Mauro Tortonesi: while i like the idea of supporting modifiers like quick (short circuit) and maybe i (case insensitive comparison), i think that (?i:) and (?-i:) constructs would be overkill and rather hard to implement. I figured that the (?i:) and (?-i:) constructs would be provided by the regular expression engine and that the --filter switch would simply be able to use any construct provided by that engine. i know, that would be really nice. If, as you said, this would be hard to implement or require extra effort by you that is above and beyond that required for the more standard constructs then I would say that they shouldn't be implemented; at least at first. i agree. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: can't recurse if no index.html
Dan Jacobson wrote: I notice with server created directory listings, one can't recurse. $ lynx -dump http://localhost/~jidanni/test|head Index of /~jidanni/test Icon [1]Name [2]Last modified [3]Size [4]Description ___ [DIR] [5]Parent Directory - [TXT] [8]cd.html 23-Feb-2006 20:55 931 $ wget --spider -S -r http://localhost/~jidanni/test/ localhost/~jidanni/test/index.html: No such file or directory hi dan, unfortunately, --spider is broken when used with -r. i am working right now in order to fix this bug. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: problem with downloading when HREF has ../
Vladimir Volovich wrote: MT == Mauro Tortonesi writes: I addressed this bug in wget few months ago. See the fix here: http://www.mail-archive.com/wget@sunsite.dk/msg08516.html MT hi frank, MT i am going to test and apply your patch later this week, as well MT as many other pending patches. unfortunately i am still working MT on my ph.d. thesis at the moment, so i don't have much time to MT work on wget. however, since i believe my thesis should be ready MT tomorrow or wednesday at most, i am planning to spend the rest of MT the week to catch up with wget. are there any news on the wget update? hrvoje fixed this problem more than one month ago. from the ChangeLog: 2006-02-27 Hrvoje Niksic [EMAIL PROTECTED] * url.c (path_simplify): Don't preserve .. at beginning of path. Suggested by Frank McCown. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
--spider and -r
dan jacobson recently reported a bug with --spider and -r: http://www.mail-archive.com/wget@sunsite.dk/msg08797.html hrvoje confirms this bug has been in wget for a long time, mainly because the semantics of --spider and -r were never properly defined. from my point of view, it makes sense that when a user specifies both --spider and -r, wget: 1) downloads resources according to -r, printing an error (with non exisiting url referer) in case of non existing URLs 2) parses downloaded resources for url 3) deletes downloaded resources do you think this is the correct semantics for --spider and -r? am i missing something here? (notice that there are significant similarities with the behaviour of -r and --delete-after, with the only exception of printing errors in case of non existing URLs.) -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: problem with downloading when HREF has ../
Hrvoje Niksic wrote: Vladimir Volovich [EMAIL PROTECTED] writes: MT == Mauro Tortonesi writes: are there any news on the wget update? MT hrvoje fixed this problem more than one month ago. from the MT ChangeLog: i don't see the official source at ftp.gnu.org/gnu/wget/ that's what i'm asking about. The fix will appear in the next release, 1.11. Mauro's paragraph you quoted (beginning with i am going to test and apply your patch later this week) referred to applying the patch to the version control repository, not to the timeframe of releasing 1.11. It is my understanding that 1.11 will be released within the next couple of months; Mauro might give a more precise date. wget 1.11 will definitely be released in the next couple of months, but i can't be more precise in this moment. at the beginning, i was thinking about adding support for regex, gnunet and fix gnutls support in that release. now i am reconsidering whether to delay these new features for 1.12 and focus on fixing the incredible number of recently reported bugs instead. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: regex support RFC
Scott Scriven wrote: * Mauro Tortonesi [EMAIL PROTECTED] wrote: wget -r --filter=-domain:www-*.yoyodyne.com This appears to match www.yoyodyne.com, www--.yoyodyne.com, www---.yoyodyne.com, and so on, if interpreted as a regex. not really. it would not match www.yoyodyne.com. It would most likely also match www---zyoyodyneXcom. yes. Perhaps you want glob patterns instead? I know I wouldn't mind having glob patterns in addition to regexes... glob is much eaesier when you're not doing complex matches. no. i was talking about regexps. they are more expressive and powerful than simple globs. i don't see what's the point in supporting both. If I had to choose just one though, I'd prefer to use PCRE, Perl-Compatible Regular Expressions. They offer a richer, more concise syntax than traditional regexes, such as \d instead of [:digit:] or [0-9]. i agree, but adding a dependency from PCRE to wget is asking for infinite maintenance nightmares. and i don't know if we can simply bundle code from PCRE in wget, as it has a BSD license. --filter=[+|-][file|path|domain]:REGEXP is it consistent? is it flawed? is there a more convenient one? It seems like a good idea, but wouldn't actually provide the regex-filtering features I'm hoping for unless there was a raw type in addition to file, domain, etc. I'll give details below. Basically, I need to match based on things like the inline CSS data, the visible link text, etc. do you mean you would like to have a regex class working on the content of downloaded files as well? Below is the original message I sent to the wget list a few months ago, about this same topic: = I'd find it useful to guide wget by using regular expressions to control which links get followed. For example, to avoid following links based on embedded css styles or link text. I've needed this several times, but the most recent was when I wanted to avoid following any add to cart or buy links on a site which uses GET parameters instead of directories to select content. Given a link like this... a href=http://www.foo.com/forums/gallery2.php?g2_controller=cart.AddToCartamp;g2_itemId=11436amp;g2_return=http%3A%2F%2Fwww.foo.com%2Fforums%2Fgallery2.php%3Fg2_view%3Dcore.ShowItem%26g2_itemId%3D11436%26g2_page%3D4%26g2_GALLERYSID%3D1d78fb5be7613cc31d33f7dfe7fbac7bamp;g2_GALLERYSID=1d78fb5be7613cc31d33f7dfe7fbac7bamp;g2_returnName=album; class=gbAdminLink gbAdminLink gbLink-cart_AddToCartadd to cart/a ... a useful parameter could be --ignore-regex='AddToCart|add to cart' so the class or link text (really, anything inside the tag) could be used to decide whether the link should be followed. Or... if there's already a way to do this, let me know. I didn't see anything in the docs, but I may have missed something. :) = I think what I want could be implemented via the --filter option, with a few small modifications to what was proposed. I'm not sure exactly what syntax to use, but it should be able to specify whether to include/exclude the link, which PCRE flags to use, how much of the raw HTML tag to use as input, and what pattern to use for matching. Here's an idea: --filter=[allow][flags,][scope][:]pattern Example: '--filter=-i,raw:add ?to ?cart' (the quotes are there only to make the shell treat it as one parameter) The details are: allow is + for include or - for exclude. It defaults to + if omitted. flags, is a set of letters to control regex options, followed by a comma (to separate it from scope). For example, i specifies a case-insensitive search. These would be the same flags that perl appends to the end of search patterns. So, instead of /foo/i, it would be --filter=+i,:foo scope controls how much of the a or similar tag gets used as input to the regex. Values include: raw: use the entire tag and all contents (default) a href=/path/to/foo.extbar/a domain: use only the domain name www.example.com file: use only the file name foo.ext path: use the directory, but not the file name /path/to others... can be added as desired : is required if allow or flags or scope is given So, for example, to exclude the add to cart links in my previous post, this could be used: --filter=-raw:'AddToCart|add to cart' or --filter=-raw:AddToCart\|add\ to\ cart or --filter=-:'AddToCart|add to cart' or --filter=-i,raw:'add ?to ?cart' Alternately, the --filter option could be split into two options: one for including content, and one for excluding. This would be more consistent with wget's existing parameters, and would slightly simplify the syntax. I hope I haven't been to full of hot air. This is a feature I've wanted in wget for a long time, and I'm a bit excited that it might happen soon. :) i don't like your raw proposal as it is HTML-specific. i would like instead to develop a mechanism which could work for all supported protocols
Re: regex support RFC
Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: Scott Scriven wrote: * Mauro Tortonesi [EMAIL PROTECTED] wrote: wget -r --filter=-domain:www-*.yoyodyne.com This appears to match www.yoyodyne.com, www--.yoyodyne.com, www---.yoyodyne.com, and so on, if interpreted as a regex. not really. it would not match www.yoyodyne.com. Why not? i may be wrong, but if - is not a special charachter, the previous expression should match only domains starting with www- and ending in [randomchar]yoyodyne[randomchar]com. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: regex support RFC
Oliver Schulze L. wrote: Hrvoje Niksic wrote: The regexp API's found on today's Unix systems might be usable, but unfortunately those are not available on Windows. My personal idea on this is to: enable regex in Unix and disable it on Windows. We all use Unix/Linux and regex is really usefull. I think not having regex on Windows will not do any more harm that it is doing now (not having it at all) for consistency and to avoid maintenance problems, i would like wget to have the same behavior on windows and unix. please, notice that if we implemented regex support only on unix, windows binaries of wget built with cygwin would have regex support but native binaries wouldn't. that would be very confusing for windows users, IMHO. I hope wget can get conection cache, this is planned for wget 1.12 (which might become 2.0). i already have some code implementing connection cache data structure. URL regex this is planned for wget 1.11. i've already started working on it. and advanced mirror functions (sync 2 folders) in the near future. this is very interesting. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it