You won't get entirely accurate numbers but you can get ballpark figures with e.g.
site:dspace.mit.edu inurl:handle inurl:show=full Basically this narrows things down to the "full item record" pages. Looks like there may be dups in there -- you could try some additional conditions. For the number of bitstreams: site:dspace.mit.edu inurl:bitstream Hope this helps Rob On Thu, Feb 19, 2009 at 05:47, Bram Luyten <[email protected]> wrote: > Hi Rob, > > I had a question somewhat related to robots.txt and they way how DSpace > instances are being indexed by google. > > As a part of the Google analytics - DSpace comparison that I've been > running, I would like to analyse which repositories are being indexed best > by Google, and how that impacts their number of visits. > > As a first, very rough estimate, I searched for: > > "site:<<repository url>>" to get an indication of how many useful pages > were indexed. It was interesting to see that these numbers did not really > corellate with visits to this repository. > I assumed that for many repositories, different browse pages were being > indexed, and that these indexed pages were not very useful to generate > visits // expose the content. > > In a second step, I tried to look for "site:<<repository url>>" -browse". > The returned numbers were in some cases even less than half of the original > number. > But I realise this search is being too restrictive: because many pages > include the word "browse" in their navigation bar, I'm probably excluding > useful item pages etc in the search. > > So my question is the following: > which search query could I use in Google, to get the number of useful > indexed pages in Google (item pages, bitstreams, collection & community > pages) ? > > Already an interesting finding from my research: > the 15 repositories already included in the research, get 60% of their > visits through search engines (average calculated on the visits in december > 2008). So even more reason to get exposure through search engines as > optimized as possible. > > best regards, > > Bram > > @mire NV > Romeinse Straat 18 > 3001 Heverlee > Belgium > +32 2 888 29 56 > > http://www.atmire.com - Institutional Repository Solutions > http://www.togather.eu - Before getting together, get t...@ther > > > On Thu, Feb 5, 2009 at 10:21 PM, Robert Tansley > <[email protected]>wrote: > >> To all users of DSpace 1.5 and DSpace 1.5.1: >> These versions of DSpace ship with a bad robots.txt file that prevents >> search engines such as Google Scholar or Yahoo from indexing any content on >> a DSpace site. To check if this applies to you: >> - Visit your site's robots.txt -- >> http://your_dspace_hostname.edu/robots.txt >> - If you see the following line you have a bad robots.txt: >> >> Disallow: /browse >> >> It is important that you REMOVE this line from your robots.txt to ensure >> that your DSpace instance is correctly indexed by search engines. More info >> on ensuring your DSpace site is correctly indexed here: >> >> http://wiki.dspace.org/index.php?title=Ensuring_your_instance_is_indexed >> >> Robert Tansley / Google >> >> >> ------------------------------------------------------------------------------ >> Create and Deploy Rich Internet Apps outside the browser with >> Adobe(R)AIR(TM) >> software. With Adobe AIR, Ajax developers can use existing skills and code >> to >> build responsive, highly engaging applications that combine the power of >> local >> resources and data with the reach of the web. Download the Adobe AIR SDK >> and >> Ajax docs to start building applications today- >> http://p.sf.net/sfu/adobe-com >> _______________________________________________ >> DSpace-tech mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dspace-tech >> >> >
_______________________________________________ Dspace-general mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/dspace-general
