So I'm using htdig to index a medium-sized site, and when I attempted
to index the site from the webserver machine itself, I noticed that it
was going slowly -- more slowly than it had during testing, when I was
indexing from a separate machine far across the internet. This seemed
odd, so I investigated with strace, and it looks to me like every 5-10
page accesses, there's a select() on an fd that's never ready for
reading, followed by a timeout and then soon a SIGPIPE. This is with
htdig 3.2.0b5 from Debian unstable (3.2.0b5-2), recompiled and running
on Debian stable (3.0/woody) with the --with-ssl option (on an
SSL-enabled site).
Here's the strace output from a typical session; pretty much every
time I strace looks almost exactly like this (since strace always
attaches during the fateful select() that times out):
select(7, [6], NULL, NULL, {21, 610000}) = 1 (in [6], left {13, 130000})
time(NULL) = 1080321571
write(1, ".-", 2) = 2
write(1, "*", 1) = 1
<snip more of the same, "*" writes>
write(1, "*", 1) = 1
write(1, "-", 1) = 1
time(NULL) = 1080321571
write(1, " size = 11482\n", 14) = 14
brk(0x83f4000) = 0x83f4000
brk(0x8405000) = 0x8405000
brk(0x8416000) = 0x8416000
brk(0x83f3000) = 0x83f3000
time(NULL) = 1080321571
write(1, "182:127:1:https://udn.epicgames."..., 78) = 78
time(NULL) = 1080321571
write(6, "\27\3\1\0\360\257\363\310[8\373\236\16Mrc\207\332\220X"..., 245) = 245
select(7, [6], NULL, NULL, {30, 0}) = 1 (in [6], left {30, 0})
read(6, "", 5) = 0
write(6, "\25\3\1\0\30\f\365I;0\235\247\332h$\352]\324\306\21\261"..., 29) = -1 EPIPE
(Broken pipe)
--- SIGPIPE (Broken pipe) ---
... and then strace terminates because of the SIGPIPE (though htdig of
course keeps going). straceing the remotely-running htdig session runs
for quite some time (30 minutes or so so far) without any SIGPIPEs or
apparent chicanery; that session also stalls for long periods of time
(15-20 seconds) in a select() call from time to time, but it doesn't get
SIGPIPE and it appears to recover; I blame it on real network latency
(though it still sounds kind of extreme) (example follows).
read(6, "\27\3\1\25\210", 5) = 5
read(6, "\331\276\0015\345\231\34U\370>Y\26\3\200Q\370\352B\\k\\"..., 5512) = 5512
select(7, [6], NULL, NULL, {30, 0}) = 1 (in [6], left {14, 400000})
time(NULL) = 1080322513
write(1, ".-", 2) = 2
write(1, "*", 1) = 1
write(1, "*", 1) = 1
Does anyone have a guess as to what's going on, or how I can make my
local spidering go faster?
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev