Isidro Aguillo from the Cybermetrics lab was kind enough to reply to the
Normalization question.
*The basic principle is that all measurements for a certain metric get
normalized against the maximum value, for any repository, of that metric.
*
In the simple example of the size metric, normalization would happen as
follows:

According to the July ranking, CERN's repository (http://cdsweb.cern.ch) is
the largest in size, with 2,590,000 pages indexed.
When the number of indexed pages for K.U. Leuven's Lirias is 253,000, the
normalized figure (253k / 2590k) is 0.0976 or 9,76%

For the google scholar metric, it's a little bit more complicated because
the average of 2 normalized totals is taken. To elaborate on the example:

site:lirias.kuleuven.be query in google scholar (all results): 21,400
site:lirias.kuleuven.be query in google scholar (only results from
2001-2008): 658

Imagine digital.csic.es has the maximum among all the world repositories
with 42.800 (all results) , lirias is then 0.5 or 50%
The max value for recent results (2001-2008) is repository.usp.br with 6580.
Then Lirias is 0.1 or 10%

The final scholar value for lirias would then be (50+10)/2 = 30% or 0.3
(rank 145th for example)

with kindest regards,

Bram Luyten

@mire - http://www.atmire.com

On Mon, Dec 13, 2010 at 11:12 AM, Bram Luyten <[email protected]> wrote:

> Hi David,
>
> JIRA does not allow anonymous interaction, so I'm afraid you'll have to
> take a minute to register an account. After you're logged in, it's really
> easy: a "Comment" button appears on the top left:
>
> Small demo:
> http://screencast.com/t/vygNWXdT
>
> About the methodology & the indicated points:
> *
> Different results based on the search engine localization*
>
> I didn't realize this, but even for something like the Size index, it's
> true that different localized pages of google give different results.
> site:hub.hku.hk on Google.com -> 726.000
> site:hub.hku.hk on Google.es -> 729.000
> site:hub.hku.hk on Google.hk -> 725.000
>
> So this must indicate that for each of the localized google pages,
> different indexes are being used. As Baidu is the largest search engine in
> Asia, the fact that baidu coverage is not included might disadvantage asian
> institutions in the ranking.
>
> *Normalization*
>
> I only know about normalization in the case of the Scholar metric, as
> described on the methodology page:
>
> *Scholar (Sc)*. Using Google Scholar database we calculate the mean of the
> normalised total number of papers and those (recent papers) published
> between 2001 and 2008.
>
> I'm unsure as well what "normalised" means in this context. Would be great
> if anyone could enlighten us.
>
>
> best regards,
>
> Bram
>
> @mire - http://www.atmire.com
>
> Technologielaan 9 - 3001 Heverlee - Belgium
> 533 2nd Street - Encinitas, CA 92024 - USA
>
> http://www.togather.eu - Before getting together, get t...@ther
>
>
> On Mon, Dec 13, 2010 at 8:13 AM, David Palmer <[email protected]> wrote:
>
>> Thanks Bram,
>>
>>
>>
>> Yes, I would support harvestable usage stats.  I did not see how to add my
>> support on the page you gave ?
>>
>>
>>
>> Webometrics.  I see I must be more specific.  I have followed the papers
>> written in the Webometrics project for both universities and repositories.
>> I tried to reproduce the results on a few sites.  I could not.  The
>> methodology is not specific enough in some cases.  In others, I wonder if
>> the search engines have different results in Spain as opposed to Hong Kong.
>> In some cases, I know this is true.  Also, I remember that part of the
>> methodology was that certain results in certain cases were “normalized.”
>> But nothing written to explain which specific results were normalized.
>>
>>
>>
>> Well, you might just conclude, like others have done, that I am dumb.
>> Hmnn, that is a possibility.  Better vitamins?  On the other hand, The
>> Journal of Irreproducible Results, comes to mind;
>>
>>         http://www.jir.com/
>>
>>
>>
>> Serious types could stop reading here, but appropro of nothing, my
>> favourite irreproducible result “the buttered cat paradox”, which goes like,
>> buttered toast will always fall face down on the ground.  Cats will always
>> land on their feet.  So if you strap a piece of buttered toast to the back
>> of the cat, and hoist out the window, you should see antigravity appear.
>>
>>
>> http://www.butteredcat.com/index.php?module=pagemaster&PAGE_user_op=view_page&PAGE_id=2&MMN_position=30:30
>>
>>
>>
>> david
>>
>>
>>
>>
>>
>> *From:* [email protected] [mailto:[email protected]] *On Behalf Of *Bram
>> Luyten
>> *Sent:* Saturday, December 11, 2010 9:09 PM
>> *To:* David Palmer
>> *Cc:* [email protected]
>> *Subject:* Re: [Dspace-general] webometrics
>>
>>
>>
>> Without a full answer to your question (apologies in advance), here's one
>> consideration:
>> the repository ranking only measures exposure through search engines. The
>> data is being gathered by launching certain queries in google, yahoo, ...
>>
>> the reason why they choose such a generic approach, is that it can work
>> independently from the platforms. It doesnt matter which platform you run,
>> as long as you have a URL (or subdomain), your repository (or website for
>> that matter) can be measured. (and they do, similar metrics are being used
>> to measure the exposure of university websites:
>> http://www.webometrics.info/ ).
>>
>> In my opinion, USAGE of repositories would be a much more valuable metric.
>> Sure, it's good to have thousands of pages indexed, but are people actively
>> downloading the files that are hosted there ?
>>
>> With the SOLR statistics work on 1.6, now that institutions are already
>> using this over a considerable amount of time, we would have the "common
>> ground" to compare usage statistics.
>>
>> I have proposed an automated OAI interface, in order to enable harvesting
>> of your usage data, based on an internationally supported standard:
>>
>> https://jira.duraspace.org/browse/DS-626 (if you think this is important,
>> please voice your support in this request ;)
>>
>> If this could make it into DSpace, I see no reason why usage date couldn't
>> be included in the ranking (at least, for DSpace repositories).
>> *
>> Somewhat related: Annual repository cost per file vs cost per download*
>>
>> From a financial management perspective, you could calculate the annual
>> cost of a repository as a cost-per-file ... let's say if you have 1000
>> files, and your internal staff time & some consultancy would cost you $5000
>> per year (just example figures, no real example), this would be a rather
>> high cost of $5 per file. However, if you would know that the number of
>> downloads is 50.000 (so 50 downloads per file on average), you can do cost
>> accounting per download. That would be $0.1 per download.
>>
>> best regards,
>>
>> Bram
>>
>> @mire - http://www.atmire.com
>>
>> Technologielaan 9 - 3001 Heverlee - Belgium
>> 533 2nd Street - Encinitas, CA 92024 - USA
>>
>> http://www.togather.eu - Before getting together, get t...@ther
>>
>> On Fri, Dec 10, 2010 at 5:03 PM, David Palmer <[email protected]> wrote:
>>
>>
>> I remain intrigued by the idea of metrics for IRs.  I have read the papers
>> on webometrics, and found questions.  I have asked and have not been
>> answered.
>>
>> Will we as a community accept this ranking without any input into its
>> formulation?  Or even without proper understanding of the methodology?
>>
>> David Palmer
>> Scholarly Communications Team Leader
>> The University of Hong Kong Libraries
>> Pokfulam Road
>> Hong Kong
>> tel. +852 2859 7004
>> http://hub.hku.hk
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Oracle to DB2 Conversion Guide: Learn learn about native support for
>> PL/SQL,
>> new data types, scalar functions, improved concurrency, built-in packages,
>> OCI, SQL*Plus, data movement tools, best practices and more.
>> http://p.sf.net/sfu/oracle-sfdev2dev
>> _______________________________________________
>> Dspace-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dspace-general
>>
>>
>>
>
>
------------------------------------------------------------------------------
Oracle to DB2 Conversion Guide: Learn learn about native support for PL/SQL,
new data types, scalar functions, improved concurrency, built-in packages, 
OCI, SQL*Plus, data movement tools, best practices and more.
http://p.sf.net/sfu/oracle-sfdev2dev 
_______________________________________________
Dspace-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-general

Reply via email to