You won't get entirely accurate numbers but you can get ballpark figures
with e.g.

site:dspace.mit.edu inurl:handle inurl:show=full

Basically this narrows things down to the "full item record" pages. Looks
like there may be dups in there -- you could try some additional conditions.

For the number of bitstreams:

site:dspace.mit.edu inurl:bitstream

Hope this helps

Rob

On Thu, Feb 19, 2009 at 05:47, Bram Luyten <bluy...@gmail.com> wrote:

> Hi Rob,
>
> I had a question somewhat related to robots.txt and they way how DSpace
> instances are being indexed by google.
>
> As a part of the Google analytics - DSpace comparison that I've been
> running, I would like to analyse which repositories are being indexed best
> by Google, and how that impacts their number of visits.
>
> As a first, very rough estimate, I searched for:
>
> "site:<<repository url>>" to get an indication of how many useful pages
> were indexed. It was interesting to see that these numbers did not really
> corellate with visits to this repository.
> I assumed that for many repositories, different browse pages were being
> indexed, and that these indexed pages were not very useful to generate
> visits // expose the content.
>
> In a second step, I tried to look for "site:<<repository url>>" -browse".
> The returned numbers were in some cases even less than half of the original
> number.
> But I realise this search is being too restrictive: because many pages
> include the word "browse" in their navigation bar, I'm probably excluding
> useful item pages etc in the search.
>
> So my question is the following:
> which search query could I use in Google, to get the number of useful
> indexed pages in Google (item pages, bitstreams, collection & community
> pages) ?
>
> Already an interesting finding from my research:
> the 15 repositories already included in the research, get 60% of their
> visits through search engines (average calculated on the visits in december
> 2008). So even more reason to get exposure through search engines as
> optimized as possible.
>
> best regards,
>
> Bram
>
> @mire NV
> Romeinse Straat 18
> 3001 Heverlee
> Belgium
> +32 2 888 29 56
>
> http://www.atmire.com - Institutional Repository Solutions
> http://www.togather.eu - Before getting together, get t...@ther
>
>
> On Thu, Feb 5, 2009 at 10:21 PM, Robert Tansley 
> <roberttans...@google.com>wrote:
>
>> To all users of DSpace 1.5 and DSpace 1.5.1:
>> These versions of DSpace ship with a bad robots.txt file that prevents
>> search engines such as Google Scholar or Yahoo from indexing any content on
>> a DSpace site. To check if this applies to you:
>> - Visit your site's robots.txt --
>> http://your_dspace_hostname.edu/robots.txt
>> - If you see the following line you have a bad robots.txt:
>>
>> Disallow: /browse
>>
>> It is important that you REMOVE this line from your robots.txt to ensure
>> that your DSpace instance is correctly indexed by search engines. More info
>> on ensuring your DSpace site is correctly indexed here:
>>
>> http://wiki.dspace.org/index.php?title=Ensuring_your_instance_is_indexed
>>
>> Robert Tansley / Google
>>
>>
>> ------------------------------------------------------------------------------
>> Create and Deploy Rich Internet Apps outside the browser with
>> Adobe(R)AIR(TM)
>> software. With Adobe AIR, Ajax developers can use existing skills and code
>> to
>> build responsive, highly engaging applications that combine the power of
>> local
>> resources and data with the reach of the web. Download the Adobe AIR SDK
>> and
>> Ajax docs to start building applications today-
>> http://p.sf.net/sfu/adobe-com
>> _______________________________________________
>> DSpace-tech mailing list
>> DSpace-tech@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>>
>>
>
------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to