Re: Proposal for despamming the list
On 14 Apr 2002, Karsten Thygesen wrote: Anyway - spamassassin is now in place - let's give it a chance before doing radical movements - and I can assure, that ezmlm is far more mature and stable than Mailman - we (sunsite) have been running both systems for years, and there is no doubt about, which type we recommends! As was very quickly proven, that just isn't enough. Or you need to add much stricter rules or whatever. I found it very ironic that the first mail after your previous post here, was a... spam! -- Daniel Stenberg - http://daniel.haxx.se - +46-705-44 31 77 ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol
Re: Proposal for despamming the list
Daniel == Daniel Stenberg [EMAIL PROTECTED] writes: Daniel On 14 Apr 2002, Karsten Thygesen wrote: Anyway - spamassassin is now in place - let's give it a chance before doing radical movements - and I can assure, that ezmlm is far more mature and stable than Mailman - we (sunsite) have been running both systems for years, and there is no doubt about, which type we recommends! Daniel As was very quickly proven, that just isn't enough. Or you Daniel need to add much stricter rules or whatever. Daniel I found it very ironic that the first mail after your Daniel previous post here, was a... spam! Yes - I have hardened the rules afterwards. But please bear in mind, that there is no 100% effective spam tool, which does not require a human interaction. During the next weeks, I'm sure that you will find, that the level of spam will be reduced to only a few percentage. Karsten
wget -r not following links
I have a site which has relative links like this: a href=jump?dest=barlang=foolink/a I have been trying different switches to make wget -r follow those links but have been unsuccesfull. Is this possible with the current version of wget? -- Mika Tuupola http://www.appelsiini.net/~tuupola/
Re: Goodbye and good riddance
On 12/04/2002 19:21:41 James C. McMaster (Jim) wrote: My patience has reached an end. Perhaps, now that you have (for the first time) indicated you will do something to fix the problem, the possible light at the end of the tunnel will convince others to stay. The light at the end of the tunnel is just the explosion around the Pu239 : -) -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: HTTP 1.1
On 12/04/2002 21:37:31 hniksic wrote: Tony Lewis [EMAIL PROTECTED] writes: Hrvoje Niksic wrote: Is there any way to make Wget use HTTP/1.1 ? Unfortunately, no. In looking at the debug output, it appears to me that wget is really sending HTTP/1.1 headers, but claiming that they are HTTP/1.0 headers. For example, the Host header was not defined in RFC 1945, but wget is sending it. Yes. That is by design -- HTTP was meant to be extended in that way. Wget is also requesting and accepting `Keep-Alive', using `Range', and so on. Csaba Raduly's patch would break Wget because it doesn't suppose the chunked transfer-encoding. Also, its understanding of persistent connection might not be compliant with HTTP/1.1. IT WAS A JOKE ! Serves me right. I need to put bigger smilies :-( -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: wget -r not following links
Mika Tuupola [EMAIL PROTECTED] writes: I have a site which has relative links like this: a href=jump?dest=barlang=foolink/a I have been trying different switches to make wget -r follow those links but have been unsuccesfull. Is this possible with the current version of wget? Can you give more details? The actual URL and a debug log would help; failing that, more information would be nice. To answer your question, yes, it is possible, and it should work.
Re: Change in behaviour between 1.7 and 1.8.1
Philipp Thomas [EMAIL PROTECTED] writes: When you issue wget --recursive --level=1 --reject=.html www.suse.de wget 1.7 really ommits downloading all the .html files except index.html (which is needed for --recursive), but wget 1.8.1 also downloads all .html files that are referenced from index.html and deletes them immediately. It is clear that the .html files are needed to find the next level of files when downloading recursively, but they should be ommitted when the recursion depth is limited and the limit has been reached. Yes. Please let me know if this patch fixes things for you: 2002-04-15 Hrvoje Niksic [EMAIL PROTECTED] * recur.c (download_child_p): Don't ignore rejection of HTML documents that are themselves leaves of recursion. Index: src/recur.c === RCS file: /pack/anoncvs/wget/src/recur.c,v retrieving revision 1.44 diff -u -r1.44 recur.c --- src/recur.c 2002/04/12 18:53:38 1.44 +++ src/recur.c 2002/04/15 18:09:47 @@ -511,23 +511,13 @@ /* 6. */ { /* Check for acceptance/rejection rules. We ignore these rules - for HTML documents because they might lead to other files which - need to be downloaded. Of course, we don't know which - documents are HTML before downloading them, so we guess. - - A file is subject to acceptance/rejection rules if: - - * u-file is not (i.e. it is not a directory) - and either: - + there is no file suffix, -+ or there is a suffix, but is not html or htm or similar, -+ both: - - recursion is not infinite, - - and we are at its very end. */ - + for directories (no file name to match) and for HTML documents, + which might lead to other files that do need to be downloaded. + That is, unless we've exhausted the recursion depth anyway. */ if (u-file[0] != '\0' -(!has_html_suffix_p (url) - || (opt.reclevel != INFINITE_RECURSION depth = opt.reclevel))) +!(has_html_suffix_p (u-file) + depth opt.reclevel - 1 + depth != INFINITE_RECURSION)) { if (!acceptable (u-file)) {
Re: dynamic IPs
You're probably right; there should be an option to disable DNS caching. As a stop-gap measure, you can simply stop `lookup_host' from caching the information it retrieves, by commenting the call to `cache_host_lookup' at the end of `lookup_host'.
Re: typo in `man wget`
[EMAIL PROTECTED] writes: Unfinished sentence... Another way to specify username and password is in the URL itself. For more information about security issues with Wget, If only that were a typo. It's a bug in the ugly script that converts the Texinfo manual to POD. :-( *sigh*
Re: dynamic IPs
Hrvoje Niksic wrote: You're probably right; there should be an option to disable DNS caching. As a stop-gap measure, you can simply stop `lookup_host' from caching the information it retrieves, by commenting the call to `cache_host_lookup' at the end of `lookup_host'. Hi, i think the idee to general disable the caching of DNS is not an so good idee. I Think the right way should be an switch to tell optional an max TTL for the DNS entrys and second the cache list need to be rewriten so that also TTL and TIME to check for valid entrys. Since now we do not respect the TTL i think. Cu Thomas Lußnig smime.p7s Description: S/MIME Cryptographic Signature
Re: wget bug (overflow)
I'm afraid that downloading files larger than 2G is not supported by Wget at the moment.
timestamping
This isn't a bug, but the offer of a new feature. The timestamping feature doesn't quite work for us, as we don't keep just the latest view of a website and we don't want to copy all those files around for each update. So I implemented a --changed-since=mmdd[hhmm] flag to only get files that have changed since then according to the header. It seems to work okay, although your extra check for file-size eqality for the timestamping feature makes me wonder if the date isn't always a good measure. One oddity is that if you point wget at a file that's older than the date at the top level, it won't be gotten and there won't be any urls to recurse on. (We're pointing it at an url that changes daily.) I tested it under Solaris 7, but there is a dependency on time() and gmtime() that I haven't conditionalized for autoconf, as I am not familiar with that tool. I would like this feature to get carried along with the rest of the codebase; would you like it? -dca
Re: WGET malformed status line
Löfstrand Thomas [EMAIL PROTECTED] writes: I have used wget with -d option to see what is going on, and it seems like the proxyserver returns the following response: X-PLEASE_WAIT. After reading the source code in http.c it seems like wget expects the answer from the proxy to be HTTP/ and a version number. Is there any easy way to bypass this response part? Maybe. But what should the response be, then? This sounds like either a gross breach of HTTP or a completely different problem. (We had a report of a proxy server returning FTP status.)
Re: small bug in wget manpage: --progress
Noel Koethe [EMAIL PROTECTED] writes: the wget 1.8.1 manpage tells me: --progress=type Select the type of the progress indicator you wish to use. Legal indicators are ``dot'' and ``bar''. The ``dot'' indicator is used by default. It traces the retrieval by printing dots on the screen, each dot representing a fixed amount of downloaded data. But it looks like the default is bar. Yes. Thanks for the report; I'm about to apply this fix. 2002-04-15 Hrvoje Niksic [EMAIL PROTECTED] * wget.texi (Download Options): Fix the documentation of `--progress'. Index: doc/wget.texi === RCS file: /pack/anoncvs/wget/doc/wget.texi,v retrieving revision 1.64 diff -u -r1.64 wget.texi --- doc/wget.texi 2002/04/13 22:44:16 1.64 +++ doc/wget.texi 2002/04/15 20:52:28 @@ -625,10 +625,15 @@ Select the type of the progress indicator you wish to use. Legal indicators are ``dot'' and ``bar''. -The ``dot'' indicator is used by default. It traces the retrieval by -printing dots on the screen, each dot representing a fixed amount of -downloaded data. +The ``bar'' indicator is used by default. It draws an ASCII progress +bar graphics (a.k.a ``thermometer'' display) indicating the status of +retrieval. If the output is not a TTY, the ``dot'' bar will be used by +default. +Use @samp{--progress=dot} to switch to the ``dot'' display. It traces +the retrieval by printing dots on the screen, each dot representing a +fixed amount of downloaded data. + When using the dotted retrieval, you may also set the @dfn{style} by specifying the type as @samp{dot:@var{style}}. Different styles assign different meaning to one dot. With the @code{default} style each dot @@ -639,11 +644,11 @@ files---each dot represents 64K retrieved, there are eight dots in a cluster, and 48 dots on each line (so each line contains 3M). -Specifying @samp{--progress=bar} will draw a nice ASCII progress bar -graphics (a.k.a ``thermometer'' display) to indicate retrieval. If the -output is not a TTY, this option will be ignored, and Wget will revert -to the dot indicator. If you want to force the bar indicator, use -@samp{--progress=bar:force}. +Note that you can set the default style using the @code{progress} +command in @file{.wgetrc}. That setting may be overridden from the +command line. The exception is that, when the output is not a TTY, the +``dot'' progress will be favored over ``bar''. To force the bar output, +use @samp{--progress=bar:force}. @item -N @itemx --timestamping
Re: selective proxy usage
Velimir Kalik [EMAIL PROTECTED] writes: Is it posible to specify for wget not to use proxy for some IPs or domains? E.g. not to use proxy for www.nba.com, but use it for everything else. Thanks and please cc replies to my email address too! Yes, that should work with the `no_proxy' environment variable. For instance: $ no_proxy=nba.com wget ...
Re: Anyone maintaining RedHat 6.x RPM's for weget?
Jeroen W. Pluimers \(mailings\) [EMAIL PROTECTED] writes: I wonder if anyone is maintaining RedHat 6.x RPM's for wget. I have no idea. But, Wget is fairly easy to build from source, so I never really bothered to find out. I could not find a 1.8.1. RPM on the net using google nor using rpmfind, and it seems the version RedHat ships is really really old. Any pointers to a download place are welcome. If you have a C compiler, building Wget should be as simple as running `configure' and `make install'.
Re: --html-extension and content type query
Picot Chappell [EMAIL PROTECTED] writes: Why doesn't wget assume that files, which don't declare content type, are text/html files? Good question. I don't know, perhaps such brokenness never occurred to me. And I don't remember anyone reporting it until now. I'm looking into patching http.c, so that if type isn't defined it gets set to text/html. Has this been done for 1.8.1 already? If so, can someone pass that patch along to me? Also, if I do this, will it cause horrible wget hiccups? I don't think it will make a difference, except improve user experience in the case that you describe. Correctly written pages will not be affected adversely, and that's what truly matters. Here is a patch that should implement what you need. Please let me know if it works for you. 2002-04-16 Hrvoje Niksic [EMAIL PROTECTED] * http.c (gethttp): If Content-Type is not given, assume text/html. Index: src/http.c === RCS file: /pack/anoncvs/wget/src/http.c,v retrieving revision 1.90 diff -u -r1.90 http.c --- src/http.c 2002/04/14 05:19:27 1.90 +++ src/http.c 2002/04/16 00:14:57 @@ -1308,10 +1308,12 @@ } } - if (type !strncasecmp (type, TEXTHTML_S, strlen (TEXTHTML_S))) + /* If content-type is not given, assume text/html. This is because + of the multitude of broken CGI's that forget to generate the + content-type. */ + if (!type || 0 == strncasecmp (type, TEXTHTML_S, strlen (TEXTHTML_S))) *dt |= TEXTHTML; else -/* We don't assume text/html by default. */ *dt = ~TEXTHTML; if (opt.html_extension (*dt TEXTHTML))
Re: FTP options
[EMAIL PROTECTED] writes: Good evening, I'm trying to make a ftp with WGET 1.7 My problem is that my PC is under a proxy, and there is no way to make an FTP unless you make first a FTP to that proxy, and then the proxy opens a ftp session to the final machine what you want to connect to You introduce as the name for the ftp when you connect to the proxy: anonymous@machine_where_you_want_ftp ¿Is there any form to make this type of ftp with Wget? I've just recently added such functionality to the CVS version of Wget. (You have to download and compile it yourself, though; see http://wget.sunsite.dk/ for instructions how to do that.) The way it works -- in the CVS version -- is as simple as setting ftp_proxy to a FTP URL representing your proxy, and Wget does the rest.
Re: problems with msgfmt making .gmo [v1.8.1]
Thanks for the report. The thing I don't quite understand is, how come you are the only one to experience this? My `msgfmt --version' says 0.10.40, so I'm not sure what your 1.3 refers to. Maybe you should upgrade gettext?