Hi, hope I am doing this right; this is the first time i posting to a mailing list.
I am presently using htdig 3.1.5, but has tweak it somewhat to try out something. As this is an older version, this feature may already be in the latest, but I still hope to figure out what I am doing wrong. What I am trying to do is to skip URLs that has already been crawled, base on the size of the document. How I did it was to add a HEAD command on a separate connection before the actual GETting of the document (in the RetrieveDocument function). Base on the Content-Length that is returned (or not), it will either continue to GET or skip out and add the followlinks that is stored in the database. Now, I have access to couple of proxies, and for one proxy (Squid 2.4 STABLE4), using the above code I occasionally get empty response header to the HEAD command (ie. "Header line:" is empty, not even the status mesg). This is not such a big issue as i default it to GET and all would be fine, but i am still curious as to how this could happen. For the other proxy (Squid 2.3 STABLE4), for most pages the HEAD command is fine, returns the status and everything. But when it starts on the GET command, the response headers and some part of the initial page are lost. Screws up the GET as it will not receive the status headers and stuff and instead start off from halfway thru the page. My question is, is this due to my code or something to do with the Proxy? Sorry if my message is somewhat garbled, but any help is appreciated :) kw __________________________________________________ Do you Yahoo!? Yahoo! Web Hosting - Let the expert host your site http://webhosting.yahoo.com ------------------------------------------------------- This sf.net email is sponsored by: To learn the basics of securing your web site with SSL, click here to get a FREE TRIAL of a Thawte Server Certificate: http://www.gothawte.com/rd524.html _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

