Hi,
I'm a new comer (6 month user of ht://dig) to this list and before =
saying anything I would like to say hi to everyone. Now to the good =
stuff =3D)
I've encounter a problem with the fetching part. I have about 1800 site =
in my "start_url" to fetch with a "max_hop_count" of 1 and it seems to =
go beyond the 1800.
HTTP statistics
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Persistent connections : Yes
HEAD call before GET : No
Connections opened : 14973
Connections closed : 14973
Changes of server : 6030
HTTP Requests : 35357
HTTP KBytes requested : 209216
HTTP Average request time : 0.647679 secs
HTTP Average speed : 9.13605 KBytes/secs
Has you can see the value of "changes server" is higher than 1800. I can =
also see in the log that it goes beyond the domain (see bellow for an =
example), the domain is www.singapore-inc.com and you can see that a =
"mailto:" and "www.sedb.com.sg" is pushed in. The problem doesn't happen =
when I fetch them alone, any suggestion or hints are welcome.
Thanks in advance,
Dann
<example>
pick: www.singapore-inc.com, # servers =3D 1851
401:503:0:http://www.singapore-inc.com/edb.html:=20
title: Singapore Economic Development Board
pushing mailto:[EMAIL PROTECTED]
+
pushing http://www.sedb.com.sg/how/index2.html
New server: www.sedb.com.sg, 80
- Persistent connections: enabled
- HEAD before GET: disabled
- Timeout: 15
- Connection space: 10
- Max Documents: -1
- TCP retries: 1
- TCP wait time: 5
Trying to retrieve robots.txt file
+
pushing http://www.sedb.com.sg/what/
+
pushing http://www.sedb.com.sg/What/index2.html
+***
pushing http://www.singapore-inc.com/ncb.html
+
pushing http://www.singapore-inc.com/nstb.html
+
...
</example>
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.