On Tue, Nov 10, 2009 at 12:03 PM, Steven S. Critchfield <[email protected]> wrote: > > or wget, or any of a number of quick one offs. > Do remember that the machine you connect to may have multiple sites running > from one http server process. So, doing a proper http 1.1 request should > be observered. > > So use wget. > > ----- "Jack" <[email protected]> wrote: > >> What is the 'right way' to snag a copy of the robots.txt file from a >> web >> site? >> I know search enginges do it all the time before they search a site, >> so >> could I get it (if it is there by) >> >> telnet <sitename> 80 >> get /robots.txt >> >> or what? >> >> ><> ... Jack >> >> Community Music Festival - Nov 14 >> For info see >> http://docs.google.com/fileview?id=0B2zjKrMU1HCRMTkwOWFlYmYtOTNhMi00NzdlLTk4Y2UtOWNiMWQ3N2UyYmNl&hl=en >> >> >> Stephen >> Leacock<http://www.brainyquote.com/quotes/authors/s/stephen_leacock.html> >> - "I detest life-insurance agents: they always argue that I shall >> some >> day >> die, which is not so." >> >> > -- > Steven Critchfield [email protected] >
Telnet is fun and useful, even if it's not the most efficient way to do things. I think the correct way to do it is to specify your request method, file, and HTTP version, then on the next line specify the host. Follow that with any other headers then a blank line. $ telnet nesman.afraid.org 80 Trying 72.51.205.154... Connected to nesman.afraid.org. Escape character is '^]'. GET /index.html HTTP/1.1 Host: nesman.afraid.org HTTP/1.1 200 OK Personally, I prefer Curl. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "NLUG" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/nlug-talk?hl=en -~----------~----~----~----~------~----~------~--~---
