Re: Noise ratio getting a bit high?

2002-01-30 Thread Thomas Reinke
FWIW, I think Andre Majorel's solution would be an elegant solution (adding X-Non-Subscriber header). It can be automated, requires no moderator at all, allows non-subscribers to post, and allows those with a lower threshold for junk to choose to ditch some of the posts to the list based on the

Re: SSL site mirroring

2002-01-13 Thread Thomas Reinke
Hrvoje Niksic wrote: Thomas Reinke [EMAIL PROTECTED] writes: Ok, either I've completely misread wget, or it has a problem mirroring SSL sites. It appears that it is deciding that the https:// scheme is something that is not to be followed. That's a bug. Your patch is close

Re: Using -pk, getting wrong behavior for frameset pages...Suggestions?

2002-01-11 Thread Thomas Reinke
Do you think this might be an issue with framesets and ssl sites? or an issue with framesets and cgi source files? This is not a problem with frames - it IS a problem with SSL. wget, while it appears to have SSL support, didn't quite get it right. The internal schems being used don't treat

SSL sites fail to be crawled

2001-12-29 Thread Thomas Reinke
It seems that SSL sites aren't crawled properly, because wget decides that the scheme is not to be followed. Offending code appears to be limited to only 3 lines located in recur.c: (version 1.8.1) Line 440: change to if (u-scheme != SCHEME_HTTP u-scheme!= SCHEME_HTTPS Line 449:

SSL site mirroring

2001-12-29 Thread Thomas Reinke
Thomas Reinke [EMAIL PROTECTED] * recur.c: fixed scheme handling for https to allow proper following of links

Patch for wget hanging on connect() call

2001-12-19 Thread Thomas Reinke
We've noted in a few cases that wget can hang on connect() due to a lack of any form of timeout management. We've made a change to the routine connect_to_one in connect.c that will implement a timeout mechanism on connect without the use of signals or alarms. I've attached the modified version

Re: Patch for wget hanging on connect() call

2001-12-19 Thread Thomas Reinke
Secondly, why not downgrade to blocking connects if you couldn't figure out how to do non-blocking ones? I suppose that's a possibility. Or we could just use FIONBIO which works on modern systems, and turn off connect timeouts for others. So far I've been consistently reaching the