Re: [Flightgear-devel] http://wiki.flightgear.org/robots.txt

2010-02-18 Thread Rob / EViLSLuT
Hi Guys,

Is the syntax of the robots.txt correct? Could be wrong.

To my knowledge this is what google likes,

User-agent: *
Disallow: /

User-agent: Googlebot
Allow: /

Happy flying!
Rob

On 02/18/2010 02:03 AM, John Denker wrote:
 On 02/17/2010 04:54 PM, Jon Stockill wrote:

   
 Presumably because there are some truly awful bots out there, and google 
 at least is known to be well behaved.
 
 But the truly awful bots don't look at robots.txt.

 In fact one of the easiest ways to catch rogue bots
 is to disallow a small part of the site and then
 blacklist anybody who goes there.

 --
 Download Intelreg; Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs 
 proactively, and fine-tune applications for parallel performance. 
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 Flightgear-devel mailing list
 Flightgear-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/flightgear-devel

   




--
Download Intelreg; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs 
proactively, and fine-tune applications for parallel performance. 
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] http://wiki.flightgear.org/robots.txt

2010-02-18 Thread John Denker
On 02/18/2010 04:07 AM, Rob / EViLSLuT wrote:

 Is the syntax of the robots.txt correct? Could be wrong.

Well, technically, it should say Googlebot instead of just
Google.  But this is such a common mistake that Googlebot
answers to the name Google, and no harm is done.
 
 To my knowledge this is what google likes,
 
 User-agent: *
 Disallow: /
 
 User-agent: Googlebot
 Allow: /

That's not the recommended form.  According to
  http://www.robotstxt.org/robotstxt.html

there is no Allow: directive.  Certainly there is no advantage 
to saying Allow: / ... and no disadvantage to using the canonical
form Disallow:  which disallows nothing.

There are situations where an Allow: directive would be helpful,
but this is not one of them.

Also, due to differences in opinion as to the interpretation of 
the robots.txt non-standard, it is a bit unpredictable whether 
bots will respond to the first match or best match ... so 
it is good practice to put more-specific directives ahead of 
less-specific ones.  In particular, the * wildcard should be
last, as it is currently on the site.

In any case, the larger point remains:  There are plenty of
perfectly reasonable, desirable bots that are being excluded by 
the current file.  Conversely there are plenty of truly horrible 
bots that will never be excluded by any robots.txt file.


--
Download Intelreg; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs 
proactively, and fine-tune applications for parallel performance. 
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] http://wiki.flightgear.org/robots.txt

2010-02-17 Thread Victhor
Excessive traffic? The wiki has been getting 503 all the time lately.
 http://wiki.flightgear.org/robots.txt
 
 User-agent: Google
 Disallow:
 
 User-agent: *
 Disallow: /
 
 #User-agent: Slurp
 #Crawl-delay: 5
 #Disallow:
 
 =
 
 Really?  A collective, open-source project that doesn't
 allow anybody other than google to index the documentation?
 
 Is there a reason for this?
 
 
 
 --
 Download Intelreg; Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs 
 proactively, and fine-tune applications for parallel performance. 
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 Flightgear-devel mailing list
 Flightgear-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/flightgear-devel



--
Download Intelreg; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs 
proactively, and fine-tune applications for parallel performance. 
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] http://wiki.flightgear.org/robots.txt

2010-02-17 Thread Jon Stockill
John Denker wrote:

 Really?  A collective, open-source project that doesn't
 allow anybody other than google to index the documentation?
 
 Is there a reason for this?

Presumably because there are some truly awful bots out there, and google 
at least is known to be well behaved.

Jon


--
Download Intelreg; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs 
proactively, and fine-tune applications for parallel performance. 
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel