[nlug] Re: get robots.txt

Don Delp Tue, 10 Nov 2009 11:06:22 -0800

On Tue, Nov 10, 2009 at 12:03 PM, Steven S. Critchfield
<[email protected]> wrote:
>
> or wget, or any of a number of quick one offs.
> Do remember that the machine you connect to may have multiple sites running
> from one http server process. So, doing a proper http 1.1 request should
> be observered.
>
> So use wget.
>
> ----- "Jack" <[email protected]> wrote:
>
>> What is the 'right way' to snag a copy of the robots.txt file from a
>> web
>> site?
>> I know search enginges do it all the time before they search a site,
>> so
>> could I get it (if it is there by)
>>
>>   telnet <sitename> 80
>>   get /robots.txt
>>
>> or what?
>>
>> ><> ... Jack
>>
>> Community Music Festival - Nov 14
>> For info see
>> http://docs.google.com/fileview?id=0B2zjKrMU1HCRMTkwOWFlYmYtOTNhMi00NzdlLTk4Y2UtOWNiMWQ3N2UyYmNl&hl=en
>>
>>
>> Stephen
>> Leacock<http://www.brainyquote.com/quotes/authors/s/stephen_leacock.html>
>> - "I detest life-insurance agents: they always argue that I shall
>> some
>> day
>> die, which is not so."
>>
>>
> --
> Steven Critchfield [email protected]
>


Telnet is fun and useful, even if it's not the most efficient way to
do things.  I think the correct way to do it is to specify your
request method, file, and HTTP version, then on the next line specify
the host.  Follow that with any other headers then a blank line.

$ telnet nesman.afraid.org 80
Trying 72.51.205.154...
Connected to nesman.afraid.org.
Escape character is '^]'.
GET /index.html HTTP/1.1
Host: nesman.afraid.org

HTTP/1.1 200 OK

Personally, I prefer Curl.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"NLUG" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/nlug-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

[nlug] Re: get robots.txt

Reply via email to