Re: Proposal for despamming the list

2002-04-15 Thread Daniel Stenberg

On 14 Apr 2002, Karsten Thygesen wrote:

 Anyway - spamassassin is now in place - let's give it a chance before doing
 radical movements - and I can assure, that ezmlm is far more mature and
 stable than Mailman - we (sunsite) have been running both systems for
 years, and there is no doubt about, which type we recommends!

As was very quickly proven, that just isn't enough. Or you need to add much
stricter rules or whatever.

I found it very ironic that the first mail after your previous post here, was
a... spam!

-- 
  Daniel Stenberg - http://daniel.haxx.se - +46-705-44 31 77
   ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol




Re: Proposal for despamming the list

2002-04-15 Thread Karsten Thygesen

 Daniel == Daniel Stenberg [EMAIL PROTECTED] writes:

 Daniel On 14 Apr 2002, Karsten Thygesen wrote:
  Anyway - spamassassin is now in place - let's give it a chance
  before doing radical movements - and I can assure, that ezmlm is
  far more mature and stable than Mailman - we (sunsite) have been
  running both systems for years, and there is no doubt about, which
  type we recommends!

 Daniel As was very quickly proven, that just isn't enough. Or you
 Daniel need to add much stricter rules or whatever.

 Daniel I found it very ironic that the first mail after your
 Daniel previous post here, was a... spam!

Yes - I have hardened the rules afterwards. But please bear in mind,
that there is no 100% effective spam tool, which does not require a
human interaction. During the next weeks, I'm sure that you will find,
that the level of spam will be reduced to only a few percentage.

Karsten



wget -r not following links

2002-04-15 Thread Mika Tuupola


I have a site which has relative links like this:

a href=jump?dest=barlang=foolink/a

I have been trying different switches to make wget -r follow
those links but have been unsuccesfull. Is this possible with
the current version of wget?

-- 
Mika Tuupola  http://www.appelsiini.net/~tuupola/




Re: Goodbye and good riddance

2002-04-15 Thread csaba . raduly


On 12/04/2002 19:21:41 James C. McMaster (Jim) wrote:

My patience has reached an end.  Perhaps, now that you have (for the first
time) indicated you will do something to fix the problem, the possible
light
at the end of the tunnel will convince others to stay.

The light at the end of the tunnel is just the explosion around the Pu239 :
-)

--
Csaba Ráduly, Software Engineer   Sophos Anti-Virus
email: [EMAIL PROTECTED]http://www.sophos.com
US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933




Re: HTTP 1.1

2002-04-15 Thread csaba . raduly


On 12/04/2002 21:37:31 hniksic wrote:

Tony Lewis [EMAIL PROTECTED] writes:

 Hrvoje Niksic wrote:

  Is there any way to make Wget use HTTP/1.1 ?

 Unfortunately, no.

 In looking at the debug output, it appears to me that wget is really
 sending HTTP/1.1 headers, but claiming that they are HTTP/1.0
 headers. For example, the Host header was not defined in RFC 1945,
 but wget is sending it.

Yes.  That is by design -- HTTP was meant to be extended in that way.
Wget is also requesting and accepting `Keep-Alive', using `Range', and
so on.

Csaba Raduly's patch would break Wget because it doesn't suppose the
chunked transfer-encoding.  Also, its understanding of persistent
connection might not be compliant with HTTP/1.1.

IT WAS A JOKE !
Serves me right. I need to put bigger smilies :-(


--
Csaba Ráduly, Software Engineer   Sophos Anti-Virus
email: [EMAIL PROTECTED]http://www.sophos.com
US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933




Re: wget -r not following links

2002-04-15 Thread Hrvoje Niksic

Mika Tuupola [EMAIL PROTECTED] writes:

   I have a site which has relative links like this:

   a href=jump?dest=barlang=foolink/a

   I have been trying different switches to make wget -r follow
   those links but have been unsuccesfull. Is this possible with
   the current version of wget?

Can you give more details?  The actual URL and a debug log would help;
failing that, more information would be nice.

To answer your question, yes, it is possible, and it should work.



Re: Change in behaviour between 1.7 and 1.8.1

2002-04-15 Thread Hrvoje Niksic

Philipp Thomas [EMAIL PROTECTED] writes:

 When you issue

  wget --recursive --level=1 --reject=.html www.suse.de

 wget 1.7 really ommits downloading all the .html files except
 index.html (which is needed for --recursive), but wget 1.8.1 also
 downloads all .html files that are referenced from index.html and
 deletes them immediately.

 It is clear that the .html files are needed to find the next level
 of files when downloading recursively, but they should be ommitted
 when the recursion depth is limited and the limit has been reached.

Yes.  Please let me know if this patch fixes things for you:

2002-04-15  Hrvoje Niksic  [EMAIL PROTECTED]

* recur.c (download_child_p): Don't ignore rejection of HTML
documents that are themselves leaves of recursion.

Index: src/recur.c
===
RCS file: /pack/anoncvs/wget/src/recur.c,v
retrieving revision 1.44
diff -u -r1.44 recur.c
--- src/recur.c 2002/04/12 18:53:38 1.44
+++ src/recur.c 2002/04/15 18:09:47
@@ -511,23 +511,13 @@
   /* 6. */
   {
 /* Check for acceptance/rejection rules.  We ignore these rules
-   for HTML documents because they might lead to other files which
-   need to be downloaded.  Of course, we don't know which
-   documents are HTML before downloading them, so we guess.
-
-   A file is subject to acceptance/rejection rules if:
-
-   * u-file is not  (i.e. it is not a directory)
-   and either:
- + there is no file suffix,
-+ or there is a suffix, but is not html or htm or similar,
-+ both:
-  - recursion is not infinite,
-  - and we are at its very end. */
-
+   for directories (no file name to match) and for HTML documents,
+   which might lead to other files that do need to be downloaded.
+   That is, unless we've exhausted the recursion depth anyway.  */
 if (u-file[0] != '\0'
-(!has_html_suffix_p (url)
-   || (opt.reclevel != INFINITE_RECURSION  depth = opt.reclevel)))
+!(has_html_suffix_p (u-file)
+ depth  opt.reclevel - 1
+ depth != INFINITE_RECURSION))
   {
if (!acceptable (u-file))
  {



Re: dynamic IPs

2002-04-15 Thread Hrvoje Niksic

You're probably right; there should be an option to disable DNS
caching.  As a stop-gap measure, you can simply stop `lookup_host'
from caching the information it retrieves, by commenting the call to
`cache_host_lookup' at the end of `lookup_host'.



Re: typo in `man wget`

2002-04-15 Thread Hrvoje Niksic

[EMAIL PROTECTED] writes:

 Unfinished sentence...

Another way to specify username and password is in the
URL itself.  For more information about security
issues with Wget,

If only that were a typo.  It's a bug in the ugly script that converts
the Texinfo manual to POD.  :-(

*sigh*



Re: dynamic IPs

2002-04-15 Thread Thomas Lussnig

Hrvoje Niksic wrote:

You're probably right; there should be an option to disable DNS
caching.  As a stop-gap measure, you can simply stop `lookup_host'
from caching the information it retrieves, by commenting the call to
`cache_host_lookup' at the end of `lookup_host'.

Hi,
i think the idee to general disable the caching of DNS is not an so good 
idee.
I Think the right way should be an switch to tell optional an max TTL 
for the DNS
entrys and second the cache list need to be rewriten so that also TTL 
and TIME
to check for valid entrys. Since now we do not respect the TTL i think.

Cu Thomas Lußnig



smime.p7s
Description: S/MIME Cryptographic Signature


Re: wget bug (overflow)

2002-04-15 Thread Hrvoje Niksic

I'm afraid that downloading files larger than 2G is not supported by
Wget at the moment.



timestamping

2002-04-15 Thread David C. Anderson

This isn't a bug, but the offer of a new feature.  The timestamping
feature doesn't quite work for us, as we don't keep just the latest
view of a website and we don't want to copy all those files around for
each update.

So I implemented a --changed-since=mmdd[hhmm] flag to only get
files that have changed since then according to the header.  It seems
to work okay, although your extra check for file-size eqality for the
timestamping feature makes me wonder if the date isn't always a good
measure.

One oddity is that if you point wget at a file that's older than the
date at the top level, it won't be gotten and there won't be any urls
to recurse on.  (We're pointing it at an url that changes daily.)

I tested it under Solaris 7, but there is a dependency on time() and
gmtime() that I haven't conditionalized for autoconf, as I am not
familiar with that tool.

I would like this feature to get carried along with the rest of the
codebase; would you like it?

-dca




Re: WGET malformed status line

2002-04-15 Thread Hrvoje Niksic

Löfstrand Thomas [EMAIL PROTECTED] writes:

 I have used wget with -d option to see what is going on, and it seems
 like the proxyserver returns the following response: X-PLEASE_WAIT.

 After reading the source code in http.c it seems like wget expects
 the answer from the proxy to be HTTP/ and a version number.

 Is there any easy way to bypass this response part?

Maybe.  But what should the response be, then?  This sounds like
either a gross breach of HTTP or a completely different problem.  (We
had a report of a proxy server returning FTP status.)



Re: small bug in wget manpage: --progress

2002-04-15 Thread Hrvoje Niksic

Noel Koethe [EMAIL PROTECTED] writes:

 the wget 1.8.1 manpage tells me:

--progress=type
Select the type of the progress indicator you wish to
use.  Legal indicators are ``dot'' and ``bar''.

The ``dot'' indicator is used by default.  It traces
the retrieval by printing dots on the screen, each dot
representing a fixed amount of downloaded data.

 But it looks like the default is bar.

Yes.  Thanks for the report; I'm about to apply this fix.


2002-04-15  Hrvoje Niksic  [EMAIL PROTECTED]

* wget.texi (Download Options): Fix the documentation of
`--progress'.

Index: doc/wget.texi
===
RCS file: /pack/anoncvs/wget/doc/wget.texi,v
retrieving revision 1.64
diff -u -r1.64 wget.texi
--- doc/wget.texi   2002/04/13 22:44:16 1.64
+++ doc/wget.texi   2002/04/15 20:52:28
@@ -625,10 +625,15 @@
 Select the type of the progress indicator you wish to use.  Legal
 indicators are ``dot'' and ``bar''.
 
-The ``dot'' indicator is used by default.  It traces the retrieval by
-printing dots on the screen, each dot representing a fixed amount of
-downloaded data.
+The ``bar'' indicator is used by default.  It draws an ASCII progress
+bar graphics (a.k.a ``thermometer'' display) indicating the status of
+retrieval.  If the output is not a TTY, the ``dot'' bar will be used by
+default.
 
+Use @samp{--progress=dot} to switch to the ``dot'' display.  It traces
+the retrieval by printing dots on the screen, each dot representing a
+fixed amount of downloaded data.
+
 When using the dotted retrieval, you may also set the @dfn{style} by
 specifying the type as @samp{dot:@var{style}}.  Different styles assign
 different meaning to one dot.  With the @code{default} style each dot
@@ -639,11 +644,11 @@
 files---each dot represents 64K retrieved, there are eight dots in a
 cluster, and 48 dots on each line (so each line contains 3M).
 
-Specifying @samp{--progress=bar} will draw a nice ASCII progress bar
-graphics (a.k.a ``thermometer'' display) to indicate retrieval.  If the
-output is not a TTY, this option will be ignored, and Wget will revert
-to the dot indicator.  If you want to force the bar indicator, use
-@samp{--progress=bar:force}.
+Note that you can set the default style using the @code{progress}
+command in @file{.wgetrc}.  That setting may be overridden from the
+command line.  The exception is that, when the output is not a TTY, the
+``dot'' progress will be favored over ``bar''.  To force the bar output,
+use @samp{--progress=bar:force}.
 
 @item -N
 @itemx --timestamping



Re: selective proxy usage

2002-04-15 Thread Hrvoje Niksic

Velimir Kalik [EMAIL PROTECTED] writes:

 Is it posible to specify for wget not to use proxy for some IPs or
 domains? E.g. not to use proxy for www.nba.com, but use it for
 everything else.

 Thanks and please cc replies to my email address too!

Yes, that should work with the `no_proxy' environment variable.  For
instance:

$ no_proxy=nba.com wget ...



Re: Anyone maintaining RedHat 6.x RPM's for weget?

2002-04-15 Thread Hrvoje Niksic

Jeroen W. Pluimers \(mailings\) [EMAIL PROTECTED] writes:

 I wonder if anyone is maintaining RedHat 6.x RPM's for wget.

I have no idea.  But, Wget is fairly easy to build from source, so I
never really bothered to find out.

 I could not find a 1.8.1. RPM on the net using google nor using
 rpmfind, and it seems the version RedHat ships is really really old.

 Any pointers to a download place are welcome.

If you have a C compiler, building Wget should be as simple as running
`configure' and `make install'.



Re: --html-extension and content type query

2002-04-15 Thread Hrvoje Niksic

Picot Chappell [EMAIL PROTECTED] writes:

 Why doesn't wget assume that files, which don't declare content
 type, are text/html files?

Good question.  I don't know, perhaps such brokenness never occurred
to me.  And I don't remember anyone reporting it until now.

 I'm looking into patching http.c, so that if type isn't defined it
 gets set to text/html.  Has this been done for 1.8.1 already?  If
 so, can someone pass that patch along to me?

 Also, if I do this, will it cause horrible wget hiccups?

I don't think it will make a difference, except improve user
experience in the case that you describe.  Correctly written pages
will not be affected adversely, and that's what truly matters.

Here is a patch that should implement what you need.  Please let me
know if it works for you.

2002-04-16  Hrvoje Niksic  [EMAIL PROTECTED]

* http.c (gethttp): If Content-Type is not given, assume
text/html.

Index: src/http.c
===
RCS file: /pack/anoncvs/wget/src/http.c,v
retrieving revision 1.90
diff -u -r1.90 http.c
--- src/http.c  2002/04/14 05:19:27 1.90
+++ src/http.c  2002/04/16 00:14:57
@@ -1308,10 +1308,12 @@
}
 }
 
-  if (type  !strncasecmp (type, TEXTHTML_S, strlen (TEXTHTML_S)))
+  /* If content-type is not given, assume text/html.  This is because
+ of the multitude of broken CGI's that forget to generate the
+ content-type.  */
+  if (!type || 0 == strncasecmp (type, TEXTHTML_S, strlen (TEXTHTML_S)))
 *dt |= TEXTHTML;
   else
-/* We don't assume text/html by default.  */
 *dt = ~TEXTHTML;
 
   if (opt.html_extension  (*dt  TEXTHTML))



Re: FTP options

2002-04-15 Thread Hrvoje Niksic

[EMAIL PROTECTED] writes:

 Good evening, I'm trying to make a ftp with WGET 1.7 My problem is
 that my PC is under a proxy, and there is no way to make an FTP
 unless you make first a FTP to that proxy, and then the proxy opens
 a ftp session to the final machine what you want to connect to You
 introduce as the name for the ftp when you connect to the proxy:
 anonymous@machine_where_you_want_ftp ¿Is there any form to make this
 type of ftp with Wget?

I've just recently added such functionality to the CVS version of
Wget.  (You have to download and compile it yourself, though; see
http://wget.sunsite.dk/ for instructions how to do that.)

The way it works -- in the CVS version -- is as simple as setting
ftp_proxy to a FTP URL representing your proxy, and Wget does the
rest.



Re: problems with msgfmt making .gmo [v1.8.1]

2002-04-15 Thread Hrvoje Niksic

Thanks for the report.  The thing I don't quite understand is, how come
you are the only one to experience this?  My `msgfmt --version' says
0.10.40, so I'm not sure what your 1.3 refers to.

Maybe you should upgrade gettext?