According to Katherine Porter:
> Geoff - actually, I did turn of persistent_connections and ignore as well
> as set the "head_before_get" setting, and only the head_before_get: true
> will result in a good crawl.
> 
> > > We have an old web server here that's running "CERN/TSX-32 WWW server
> > > version 3.0".  The web server only supports HTTP/1.0.  Unless I turn
> > 
> > That's OK. Plenty of servers still only support HTTP/1.0.
> > 
> > > on "head_before_get" in my configuration file, it won't even attempt
> > > to pull down a single URL from the server.
> > 
> > Try turning off persistent_connections and ignore the head_before_get
> > setting:
> > <http://www.htdig.org/dev/htdig-3.2/attrs.html#persistent_connections>
> > 
> > My guess is that the HTTP/1.1 persistent connection code isn't properly
> > downgrading the connection for the HTTP/1.0 server.

Hmm.  I'd be interested in knowing if 3.1.5, or the latest 3.1.6 snapshot,
has any problems with this same web server.  If so, I'd suspect a problem
with the server itself.  If not, then it's likely a bug in the 3.2
HTTP code.  3.1.x doesn't support HTTP/1.1 or persistent connections,
but it doesn't do HEAD requests before GET requests either.  If your
server needs HEAD requests, I'd think that's a bug.  On the other hand,
more likely what's happening is a side-effect in the head_before_get
implementation is sidestepping a bug in the 3.2 HTTP code.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to