So I'm using htdig to index a medium-sized site, and when I attempted
to index the site from the webserver machine itself, I noticed that it
was going slowly -- more slowly than it had during testing, when I was
indexing from a separate machine far across the internet.  This seemed
odd, so I investigated with strace, and it looks to me like every 5-10
page accesses, there's a select() on an fd that's never ready for
reading, followed by a timeout and then soon a SIGPIPE.  This is with
htdig 3.2.0b5 from Debian unstable (3.2.0b5-2), recompiled and running
on Debian stable (3.0/woody) with the --with-ssl option (on an
SSL-enabled site).
  Here's the strace output from a typical session; pretty much every
time I strace looks almost exactly like this (since strace always
attaches during the fateful select() that times out):

select(7, [6], NULL, NULL, {21, 610000}) = 1 (in [6], left {13, 130000})
time(NULL)                              = 1080321571
write(1, ".-", 2)                       = 2
write(1, "*", 1)                        = 1
  <snip more of the same, "*" writes>
write(1, "*", 1)                        = 1
write(1, "-", 1)                        = 1
time(NULL)                              = 1080321571
write(1, " size = 11482\n", 14)         = 14
brk(0x83f4000)                          = 0x83f4000
brk(0x8405000)                          = 0x8405000
brk(0x8416000)                          = 0x8416000
brk(0x83f3000)                          = 0x83f3000
time(NULL)                              = 1080321571
write(1, "182:127:1:https://udn.epicgames.";..., 78) = 78
time(NULL)                              = 1080321571
write(6, "\27\3\1\0\360\257\363\310[8\373\236\16Mrc\207\332\220X"..., 245) = 245
select(7, [6], NULL, NULL, {30, 0})     = 1 (in [6], left {30, 0})
read(6, "", 5)                          = 0
write(6, "\25\3\1\0\30\f\365I;0\235\247\332h$\352]\324\306\21\261"..., 29) = -1 EPIPE 
(Broken pipe)
--- SIGPIPE (Broken pipe) ---

  ... and then strace terminates because of the SIGPIPE (though htdig of
course keeps going).  straceing the remotely-running htdig session runs
for quite some time (30 minutes or so so far) without any SIGPIPEs or
apparent chicanery; that session also stalls for long periods of time
(15-20 seconds) in a select() call from time to time, but it doesn't get
SIGPIPE and it appears to recover; I blame it on real network latency
(though it still sounds kind of extreme) (example follows).

read(6, "\27\3\1\25\210", 5)            = 5
read(6, "\331\276\0015\345\231\34U\370>Y\26\3\200Q\370\352B\\k\\"..., 5512) = 5512
select(7, [6], NULL, NULL, {30, 0})     = 1 (in [6], left {14, 400000})
time(NULL)                              = 1080322513
write(1, ".-", 2)                       = 2
write(1, "*", 1)                        = 1
write(1, "*", 1)                        = 1

  Does anyone have a guess as to what's going on, or how I can make my
local spidering go faster?



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to