Hi,

I'm a new comer (6 month user of ht://dig) to this list and before =
saying anything I would like to say hi to everyone. Now to the good =
stuff =3D)

I've encounter a problem with the fetching part. I have about 1800 site =
in my "start_url" to fetch with a "max_hop_count" of 1 and it seems to =
go beyond the 1800.

HTTP statistics
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 Persistent connections    : Yes
 HEAD call before GET      : No
 Connections opened        : 14973
 Connections closed        : 14973
 Changes of server         : 6030
 HTTP Requests             : 35357
 HTTP KBytes requested     : 209216
 HTTP Average request time : 0.647679 secs
 HTTP Average speed        : 9.13605 KBytes/secs

Has you can see the value of "changes server" is higher than 1800. I can =
also see in the log that it goes beyond the domain (see bellow for an =
example), the domain is www.singapore-inc.com and you can see that a =
"mailto:" and "www.sedb.com.sg" is pushed in. The problem doesn't happen =
when I fetch them alone, any suggestion or hints are welcome.

Thanks in advance,

Dann

<example>
pick: www.singapore-inc.com, # servers =3D 1851
401:503:0:http://www.singapore-inc.com/edb.html:=20
title: Singapore Economic Development Board

   pushing mailto:[EMAIL PROTECTED]
+
   pushing http://www.sedb.com.sg/how/index2.html

New server: www.sedb.com.sg, 80
 - Persistent connections: enabled
 - HEAD before GET: disabled
 - Timeout: 15
 - Connection space: 10
 - Max Documents: -1
 - TCP retries: 1
 - TCP wait time: 5
Trying to retrieve robots.txt file
+
   pushing http://www.sedb.com.sg/what/
+
   pushing http://www.sedb.com.sg/What/index2.html
+***
   pushing http://www.singapore-inc.com/ncb.html
+
   pushing http://www.singapore-inc.com/nstb.html
+
...
</example>


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.


Reply via email to