So I'm using htdig to index a medium-sized site, and when I attempted to index the site from the webserver machine itself, I noticed that it was going slowly -- more slowly than it had during testing, when I was indexing from a separate machine far across the internet. This seemed odd, so I investigated with strace, and it looks to me like every 5-10 page accesses, there's a select() on an fd that's never ready for reading, followed by a timeout and then soon a SIGPIPE. This is with htdig 3.2.0b5 from Debian unstable (3.2.0b5-2), recompiled and running on Debian stable (3.0/woody) with the --with-ssl option (on an SSL-enabled site). Here's the strace output from a typical session; pretty much every time I strace looks almost exactly like this (since strace always attaches during the fateful select() that times out):
select(7, [6], NULL, NULL, {21, 610000}) = 1 (in [6], left {13, 130000}) time(NULL) = 1080321571 write(1, ".-", 2) = 2 write(1, "*", 1) = 1 <snip more of the same, "*" writes> write(1, "*", 1) = 1 write(1, "-", 1) = 1 time(NULL) = 1080321571 write(1, " size = 11482\n", 14) = 14 brk(0x83f4000) = 0x83f4000 brk(0x8405000) = 0x8405000 brk(0x8416000) = 0x8416000 brk(0x83f3000) = 0x83f3000 time(NULL) = 1080321571 write(1, "182:127:1:https://udn.epicgames."..., 78) = 78 time(NULL) = 1080321571 write(6, "\27\3\1\0\360\257\363\310[8\373\236\16Mrc\207\332\220X"..., 245) = 245 select(7, [6], NULL, NULL, {30, 0}) = 1 (in [6], left {30, 0}) read(6, "", 5) = 0 write(6, "\25\3\1\0\30\f\365I;0\235\247\332h$\352]\324\306\21\261"..., 29) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) --- ... and then strace terminates because of the SIGPIPE (though htdig of course keeps going). straceing the remotely-running htdig session runs for quite some time (30 minutes or so so far) without any SIGPIPEs or apparent chicanery; that session also stalls for long periods of time (15-20 seconds) in a select() call from time to time, but it doesn't get SIGPIPE and it appears to recover; I blame it on real network latency (though it still sounds kind of extreme) (example follows). read(6, "\27\3\1\25\210", 5) = 5 read(6, "\331\276\0015\345\231\34U\370>Y\26\3\200Q\370\352B\\k\\"..., 5512) = 5512 select(7, [6], NULL, NULL, {30, 0}) = 1 (in [6], left {14, 400000}) time(NULL) = 1080322513 write(1, ".-", 2) = 2 write(1, "*", 1) = 1 write(1, "*", 1) = 1 Does anyone have a guess as to what's going on, or how I can make my local spidering go faster? ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev