Hi Rob,

I had a question somewhat related to robots.txt and they way how DSpace
instances are being indexed by google.

As a part of the Google analytics - DSpace comparison that I've been
running, I would like to analyse which repositories are being indexed best
by Google, and how that impacts their number of visits.

As a first, very rough estimate, I searched for:

"site:<<repository url>>" to get an indication of how many useful pages were
indexed. It was interesting to see that these numbers did not really
corellate with visits to this repository.
I assumed that for many repositories, different browse pages were being
indexed, and that these indexed pages were not very useful to generate
visits // expose the content.

In a second step, I tried to look for "site:<<repository url>>" -browse".
The returned numbers were in some cases even less than half of the original
But I realise this search is being too restrictive: because many pages
include the word "browse" in their navigation bar, I'm probably excluding
useful item pages etc in the search.

So my question is the following:
which search query could I use in Google, to get the number of useful
indexed pages in Google (item pages, bitstreams, collection & community
pages) ?

Already an interesting finding from my research:
the 15 repositories already included in the research, get 60% of their
visits through search engines (average calculated on the visits in december
2008). So even more reason to get exposure through search engines as
optimized as possible.

best regards,


@mire NV
Romeinse Straat 18
3001 Heverlee
+32 2 888 29 56

http://www.atmire.com - Institutional Repository Solutions
http://www.togather.eu - Before getting together, get t...@ther

On Thu, Feb 5, 2009 at 10:21 PM, Robert Tansley <roberttans...@google.com>wrote:

> To all users of DSpace 1.5 and DSpace 1.5.1:
> These versions of DSpace ship with a bad robots.txt file that prevents
> search engines such as Google Scholar or Yahoo from indexing any content on
> a DSpace site. To check if this applies to you:
> - Visit your site's robots.txt --
> http://your_dspace_hostname.edu/robots.txt
> - If you see the following line you have a bad robots.txt:
> Disallow: /browse
> It is important that you REMOVE this line from your robots.txt to ensure
> that your DSpace instance is correctly indexed by search engines. More info
> on ensuring your DSpace site is correctly indexed here:
> http://wiki.dspace.org/index.php?title=Ensuring_your_instance_is_indexed
> Robert Tansley / Google
> ------------------------------------------------------------------------------
> Create and Deploy Rich Internet Apps outside the browser with
> Adobe(R)AIR(TM)
> software. With Adobe AIR, Ajax developers can use existing skills and code
> to
> build responsive, highly engaging applications that combine the power of
> local
> resources and data with the reach of the web. Download the Adobe AIR SDK
> and
> Ajax docs to start building applications today-
> http://p.sf.net/sfu/adobe-com
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
DSpace-tech mailing list

Reply via email to