Isidro Aguillo from the Cybermetrics lab was kind enough to reply to the Normalization question. *The basic principle is that all measurements for a certain metric get normalized against the maximum value, for any repository, of that metric. * In the simple example of the size metric, normalization would happen as follows:
According to the July ranking, CERN's repository (http://cdsweb.cern.ch) is the largest in size, with 2,590,000 pages indexed. When the number of indexed pages for K.U. Leuven's Lirias is 253,000, the normalized figure (253k / 2590k) is 0.0976 or 9,76% For the google scholar metric, it's a little bit more complicated because the average of 2 normalized totals is taken. To elaborate on the example: site:lirias.kuleuven.be query in google scholar (all results): 21,400 site:lirias.kuleuven.be query in google scholar (only results from 2001-2008): 658 Imagine digital.csic.es has the maximum among all the world repositories with 42.800 (all results) , lirias is then 0.5 or 50% The max value for recent results (2001-2008) is repository.usp.br with 6580. Then Lirias is 0.1 or 10% The final scholar value for lirias would then be (50+10)/2 = 30% or 0.3 (rank 145th for example) with kindest regards, Bram Luyten @mire - http://www.atmire.com On Mon, Dec 13, 2010 at 11:12 AM, Bram Luyten <[email protected]> wrote: > Hi David, > > JIRA does not allow anonymous interaction, so I'm afraid you'll have to > take a minute to register an account. After you're logged in, it's really > easy: a "Comment" button appears on the top left: > > Small demo: > http://screencast.com/t/vygNWXdT > > About the methodology & the indicated points: > * > Different results based on the search engine localization* > > I didn't realize this, but even for something like the Size index, it's > true that different localized pages of google give different results. > site:hub.hku.hk on Google.com -> 726.000 > site:hub.hku.hk on Google.es -> 729.000 > site:hub.hku.hk on Google.hk -> 725.000 > > So this must indicate that for each of the localized google pages, > different indexes are being used. As Baidu is the largest search engine in > Asia, the fact that baidu coverage is not included might disadvantage asian > institutions in the ranking. > > *Normalization* > > I only know about normalization in the case of the Scholar metric, as > described on the methodology page: > > *Scholar (Sc)*. Using Google Scholar database we calculate the mean of the > normalised total number of papers and those (recent papers) published > between 2001 and 2008. > > I'm unsure as well what "normalised" means in this context. Would be great > if anyone could enlighten us. > > > best regards, > > Bram > > @mire - http://www.atmire.com > > Technologielaan 9 - 3001 Heverlee - Belgium > 533 2nd Street - Encinitas, CA 92024 - USA > > http://www.togather.eu - Before getting together, get t...@ther > > > On Mon, Dec 13, 2010 at 8:13 AM, David Palmer <[email protected]> wrote: > >> Thanks Bram, >> >> >> >> Yes, I would support harvestable usage stats. I did not see how to add my >> support on the page you gave ? >> >> >> >> Webometrics. I see I must be more specific. I have followed the papers >> written in the Webometrics project for both universities and repositories. >> I tried to reproduce the results on a few sites. I could not. The >> methodology is not specific enough in some cases. In others, I wonder if >> the search engines have different results in Spain as opposed to Hong Kong. >> In some cases, I know this is true. Also, I remember that part of the >> methodology was that certain results in certain cases were “normalized.” >> But nothing written to explain which specific results were normalized. >> >> >> >> Well, you might just conclude, like others have done, that I am dumb. >> Hmnn, that is a possibility. Better vitamins? On the other hand, The >> Journal of Irreproducible Results, comes to mind; >> >> http://www.jir.com/ >> >> >> >> Serious types could stop reading here, but appropro of nothing, my >> favourite irreproducible result “the buttered cat paradox”, which goes like, >> buttered toast will always fall face down on the ground. Cats will always >> land on their feet. So if you strap a piece of buttered toast to the back >> of the cat, and hoist out the window, you should see antigravity appear. >> >> >> http://www.butteredcat.com/index.php?module=pagemaster&PAGE_user_op=view_page&PAGE_id=2&MMN_position=30:30 >> >> >> >> david >> >> >> >> >> >> *From:* [email protected] [mailto:[email protected]] *On Behalf Of *Bram >> Luyten >> *Sent:* Saturday, December 11, 2010 9:09 PM >> *To:* David Palmer >> *Cc:* [email protected] >> *Subject:* Re: [Dspace-general] webometrics >> >> >> >> Without a full answer to your question (apologies in advance), here's one >> consideration: >> the repository ranking only measures exposure through search engines. The >> data is being gathered by launching certain queries in google, yahoo, ... >> >> the reason why they choose such a generic approach, is that it can work >> independently from the platforms. It doesnt matter which platform you run, >> as long as you have a URL (or subdomain), your repository (or website for >> that matter) can be measured. (and they do, similar metrics are being used >> to measure the exposure of university websites: >> http://www.webometrics.info/ ). >> >> In my opinion, USAGE of repositories would be a much more valuable metric. >> Sure, it's good to have thousands of pages indexed, but are people actively >> downloading the files that are hosted there ? >> >> With the SOLR statistics work on 1.6, now that institutions are already >> using this over a considerable amount of time, we would have the "common >> ground" to compare usage statistics. >> >> I have proposed an automated OAI interface, in order to enable harvesting >> of your usage data, based on an internationally supported standard: >> >> https://jira.duraspace.org/browse/DS-626 (if you think this is important, >> please voice your support in this request ;) >> >> If this could make it into DSpace, I see no reason why usage date couldn't >> be included in the ranking (at least, for DSpace repositories). >> * >> Somewhat related: Annual repository cost per file vs cost per download* >> >> From a financial management perspective, you could calculate the annual >> cost of a repository as a cost-per-file ... let's say if you have 1000 >> files, and your internal staff time & some consultancy would cost you $5000 >> per year (just example figures, no real example), this would be a rather >> high cost of $5 per file. However, if you would know that the number of >> downloads is 50.000 (so 50 downloads per file on average), you can do cost >> accounting per download. That would be $0.1 per download. >> >> best regards, >> >> Bram >> >> @mire - http://www.atmire.com >> >> Technologielaan 9 - 3001 Heverlee - Belgium >> 533 2nd Street - Encinitas, CA 92024 - USA >> >> http://www.togather.eu - Before getting together, get t...@ther >> >> On Fri, Dec 10, 2010 at 5:03 PM, David Palmer <[email protected]> wrote: >> >> >> I remain intrigued by the idea of metrics for IRs. I have read the papers >> on webometrics, and found questions. I have asked and have not been >> answered. >> >> Will we as a community accept this ranking without any input into its >> formulation? Or even without proper understanding of the methodology? >> >> David Palmer >> Scholarly Communications Team Leader >> The University of Hong Kong Libraries >> Pokfulam Road >> Hong Kong >> tel. +852 2859 7004 >> http://hub.hku.hk >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> Oracle to DB2 Conversion Guide: Learn learn about native support for >> PL/SQL, >> new data types, scalar functions, improved concurrency, built-in packages, >> OCI, SQL*Plus, data movement tools, best practices and more. >> http://p.sf.net/sfu/oracle-sfdev2dev >> _______________________________________________ >> Dspace-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dspace-general >> >> >> > >
------------------------------------------------------------------------------ Oracle to DB2 Conversion Guide: Learn learn about native support for PL/SQL, new data types, scalar functions, improved concurrency, built-in packages, OCI, SQL*Plus, data movement tools, best practices and more. http://p.sf.net/sfu/oracle-sfdev2dev
_______________________________________________ Dspace-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-general
