According to Katherine Porter:
> Geoff - actually, I did turn of persistent_connections and ignore as well
> as set the "head_before_get" setting, and only the head_before_get: true
> will result in a good crawl.
>
> > > We have an old web server here that's running "CERN/TSX-32 WWW server
> > > version 3.0". The web server only supports HTTP/1.0. Unless I turn
> >
> > That's OK. Plenty of servers still only support HTTP/1.0.
> >
> > > on "head_before_get" in my configuration file, it won't even attempt
> > > to pull down a single URL from the server.
> >
> > Try turning off persistent_connections and ignore the head_before_get
> > setting:
> > <http://www.htdig.org/dev/htdig-3.2/attrs.html#persistent_connections>
> >
> > My guess is that the HTTP/1.1 persistent connection code isn't properly
> > downgrading the connection for the HTTP/1.0 server.
Hmm. I'd be interested in knowing if 3.1.5, or the latest 3.1.6 snapshot,
has any problems with this same web server. If so, I'd suspect a problem
with the server itself. If not, then it's likely a bug in the 3.2
HTTP code. 3.1.x doesn't support HTTP/1.1 or persistent connections,
but it doesn't do HEAD requests before GET requests either. If your
server needs HEAD requests, I'd think that's a bug. On the other hand,
more likely what's happening is a side-effect in the head_before_get
implementation is sidestepping a bug in the 3.2 HTTP code.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-dev