Re: Let Googlebot crawl our cvs?

Daniel Rall Thu, 04 Dec 2003 00:52:30 -0800

I seem to recall that use of ViewCVS's "*checkout*"-style URLs is fairly expensive on the server-side. viewcvs.py uses bincvs.rcslib -- which in turn uses the rcsfile binary -- to parse the ,v files in the CVS repository itself. This operation scales linearly with the size of the ,v file (a function of number of changes and size of each delta). This is likely part of the reason we block robots from browsing it with a robots.txt of:

User-agent: *
Disallow: /

Reconstituting the trunk trades a little disk space for better performance and scalability. Reconstituting other branches might be useful as well, but would end up using up a lot more disk space.

Martin van den Bemt wrote:

Hmm think I get too much mail to remember everything :) Sorry :) But maybe the cvsweb url is a nice idea anyway :)
Mvgr,
Martin
On Wed, 2003-12-03 at 15:13, Davanum Srinivas wrote:
Martin,

Here's my original email.....

"I was looking for a replacement for JDK1.4 String.split that would work in 1.3 environment and found that turbine had one (http://www.google.com/search?q=stringutils++split+site%3Aapache.org) and then i was trying to find where in our cvs if the latest code and took a while to finally found it in Jakarta commons' lang project.
To cut a long story short....Should we make finding existing code easier by allowing 
google's
crawler to crawl http://cvs.apache.org/viewcvs/? (currently there is a
http://cvs.apache.org/robots.txt that prevents this from happening)."

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Let Googlebot crawl our cvs?

Reply via email to