Hi Rob, I had a question somewhat related to robots.txt and they way how DSpace instances are being indexed by google.
As a part of the Google analytics - DSpace comparison that I've been running, I would like to analyse which repositories are being indexed best by Google, and how that impacts their number of visits. As a first, very rough estimate, I searched for: "site:<<repository url>>" to get an indication of how many useful pages were indexed. It was interesting to see that these numbers did not really corellate with visits to this repository. I assumed that for many repositories, different browse pages were being indexed, and that these indexed pages were not very useful to generate visits // expose the content. In a second step, I tried to look for "site:<<repository url>>" -browse". The returned numbers were in some cases even less than half of the original number. But I realise this search is being too restrictive: because many pages include the word "browse" in their navigation bar, I'm probably excluding useful item pages etc in the search. So my question is the following: which search query could I use in Google, to get the number of useful indexed pages in Google (item pages, bitstreams, collection & community pages) ? Already an interesting finding from my research: the 15 repositories already included in the research, get 60% of their visits through search engines (average calculated on the visits in december 2008). So even more reason to get exposure through search engines as optimized as possible. best regards, Bram @mire NV Romeinse Straat 18 3001 Heverlee Belgium +32 2 888 29 56 http://www.atmire.com - Institutional Repository Solutions http://www.togather.eu - Before getting together, get t...@ther On Thu, Feb 5, 2009 at 10:21 PM, Robert Tansley <[email protected]>wrote: > To all users of DSpace 1.5 and DSpace 1.5.1: > These versions of DSpace ship with a bad robots.txt file that prevents > search engines such as Google Scholar or Yahoo from indexing any content on > a DSpace site. To check if this applies to you: > - Visit your site's robots.txt -- > http://your_dspace_hostname.edu/robots.txt > - If you see the following line you have a bad robots.txt: > > Disallow: /browse > > It is important that you REMOVE this line from your robots.txt to ensure > that your DSpace instance is correctly indexed by search engines. More info > on ensuring your DSpace site is correctly indexed here: > > http://wiki.dspace.org/index.php?title=Ensuring_your_instance_is_indexed > > Robert Tansley / Google > > > ------------------------------------------------------------------------------ > Create and Deploy Rich Internet Apps outside the browser with > Adobe(R)AIR(TM) > software. With Adobe AIR, Ajax developers can use existing skills and code > to > build responsive, highly engaging applications that combine the power of > local > resources and data with the reach of the web. Download the Adobe AIR SDK > and > Ajax docs to start building applications today- > http://p.sf.net/sfu/adobe-com > _______________________________________________ > DSpace-tech mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dspace-tech > >
_______________________________________________ Dspace-general mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/dspace-general
