Looks like you're missing the "Host" header on your request. Some servers 
care. There may be many domain names assigned to the same IP address (virtual 
hosting) so the server needs the Host header so it knows who do you want to 
pretend to be today.  :)

I made a mod to my copy of libwww years ago to address this. Sorry I don't 
have a perfect context diff but maybe this will help. Apply this mod to www.pl 
(insert the lines marked with "+"). Hope this helps.


    if (!$scheme)
    {
        return &wwwerror'onrequest($wwwerror'RC_bad_request_client, $method,
                         $scheme, $host, $port, $object, *headers, *content,
                         "URL requested does not have an access scheme");
    }

+ # some servers require HOST header cause they don't know who they are!
+ # need to grab it here before proxy logic substitutes the proxy as the host
+     $headers{'Host'} = $host;
+     if ($port) {
+       $headers{'Host'} .= ":$port";
+     }

      if ($proxy = &lookup_proxy($scheme, $host, $port))






From:   jianqun%andrew.cmu.edu@Internet on 2000-04-05 10:54 AM
To:     Marvin Simkin@AMEX
cc:     libwww%perl.org@Internet, jianqun%andrew.cmu.edu@Internet 
Subject:        Re: Questions about Momspider


Hi, Marvin:

Thank you so much for your reply. Here are all the headers from my failed
GET request:

GET http://www.ctheory.com/r45.html HTTP/1.0
User-Agent: GET/0.5 libwww-perl/0.40
From: [EMAIL PROTECTED]

HTTP/1.1 404 Not Found
Date: Tue, 04 Apr 2000 17:52:39 GMT
Server: Apache/1.3.9 (Unix) ApacheJServ/1.1b3
Connection: close
Content-Type: text/html

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>404 Not Found</TITLE>
</HEAD><BODY>
<H1>Not Found</H1>
The requested URL /r45.html was not found on this server.<P>
</BODY></HTML>

Hope you can identify what is wrong on my side.

Nice day,

Jianqun (Jane) Wang

On 5 Apr 2000, Marvin Simkin wrote:

> I had no problem with it. Could you send a copy of all the headers from your 
> failed GET request?
> 
> $ get http://www.ctheory.com/r45.html
> GET http://www.ctheory.com/r45.html HTTP/1.0
> Host: www.ctheory.com
> Pragma: no-cache
> User-Agent: Mozilla/4.03 [en] (MOMspider)
> 
> HTTP/1.1 200 OK
> Date: Tue, 04 Apr 2000 17:30:31 GMT
> Server: Apache/1.3.9 (Unix) ApacheJServ/1.1b3
> Connection: close
> Content-type: text/html
> 
> <html>
> <head>
> <title>CTHEORY: The Book is Dead, Long Live the Book</title>
> 





Reply via email to