Re: GoogleBot Robot.txt

Jim McAtee Tue, 26 Nov 2002 20:34:21 -0800

What good would it do you if the indexing robot sees exactly the same page
content on every page it retrieves?


If you just want to keep googlebot from spidering the entire site, including the
root level, use the following in your robots.txt:

User-agent: googlebot
Disallow: /

But I'd find out _why_ the robot is accessing files without passing parameters.
If some of those files aren't complete web pages (cfinclude's or cf tags), I'd
really be worried.  Normally, it only follows links that exist somewhere on your
site, or possibly on another site.  So it would suggest you've got some bad
links somewhere.

Jim


----- Original Message -----
From: "cfhelp" <[EMAIL PROTECTED]>
To: "CF-Talk" <[EMAIL PROTECTED]>
Sent: Tuesday, November 26, 2002 6:14 PM
Subject: GoogleBot Robot.txt


> I have put a robots.txt file in the root of my site to stop it from
> accessing subdirectories. But GoogleBot is still accessing all the files in
> my root bit some of the files do not run without passing a variable to them.
> So my Application Logfile is full of errors.
>
> So I want to know if it is a good idea to put this in my Application.cfm
> file?
>
> <cfif HTTP_USER_AGENT IS 'Googlebot/2.1
> (+http://www.googlebot.com/bot.html)'>
>
> <html>
>  <head>
> <title>Site Title</title>
>
> <meta tags>
>
>   </head>
> <body>
>
> Text from the About page
>
> </body>
> </html>
> <cfabort>
> </cfif>
>
>
> This way Google gets to spider all pages and I don't get any errors.
>
> Rick
> 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4
Subscription: 
http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribe&forumid=4
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
Your ad could be here. Monies from ads go to support these lists and provide more 
resources for the community. http://www.fusionauthority.com/ads.cfm

Re: GoogleBot Robot.txt

Reply via email to