Hi Eric, I agree with Ludovic that rank-sensitive measures are the preferable path to take in this setting. The problem with Spearman's correlation style measures is that they penalize (reward) the differences (similarities) at the top of the list equally to those at the bottom (whereas the assumption is that the higher-ranked elements are of more importance).

## Advertising

We proposed another rank-sensitive measure, called Weighted Overlap, and showed that it can improve over Rank-Biased Overlap (as well as Spearman's correlation and cosine distance) in settings where you have a pair of ranked lists (or vectors) to compare. >From senses to texts: An all-in-one graph-based approach for measuring semantic similarity http://wwwusers.di.uniroma1.it/~navigli/pubs/AIJ_2015_Pilehvar_Navigli.pdf (page 107) -Taher On Fri, Oct 14, 2016 at 10:40 AM, Ludovic Tanguy < ludovic.tan...@univ-tlse2.fr> wrote: > Eric, > > For the exact problem of comparing word lists ordered by distributional > similarity across different methods I used Rank-Biased Overlap, which is > described here: > > W. Webber, A. Moffat, J. Zobel, "A similarity measure for indefinite > rankings," ACM Transactions on Information Systems, Volume 28(4), > p. 20, 2010 > > In a nutshell, it consists in measuring the overlap between the two lists > at each position, while introducing a bias based on the rank so that a > difference further in the lists is less important than at the top. > > I have some R code available if needed. > > Ludovic Tanguy > > > > > > On 14/10/2016 11:01, Eric Atwell wrote: > >> Thanks Olga, BUT I don't see how Spearman's (or other) rank correlation >> formula measures overlap between 2 ranked lists containing some >> different words. >> >> >> For example, what is the similarity between 2 "distributional semantics" >> representations of God?: >> >> (1 move, 2 bless, 3 and, 4 forbid, 5 have, 6 be, 7 create, 8 do, 9 know, >> 10 want) and >> >> (1 bless, 2 blesses, 3 delusion, 4 incarnate, 5 hates, 6 himself, 7 >> exists, 8 almighty, 9 forbid, 10 rest) >> >> >> A simple measure involves count of overlapping words (bless, forbid) >> i.e. this scores 2 (or 2/10, or 2/20) >> >> BUT I also want to take into account ranks: (bless 2,1), (forbid 4,9) >> >> >> Does God shed light on my problems? >> >> >> Eric >> >> >> Eric Atwell, Asst Prof, Language@Leeds and Artificial Intelligence >> groups, >> School of Computing, University of Leeds, Times University of the Year >> 2017 >> >> ------------------------------------------------------------------------ >> *From:* Ольга Ляшевская <ole...@yandex.ru> >> *Sent:* 14 October 2016 08:09:36 >> *To:* Eric Atwell; CORPORA discussion forum >> *Cc:* AbdulRahman AlOsaimy; Claire Brierley >> *Subject:* Re: [Corpora-List] metric of overlap between two ranked lists? >> >> >> Dear Eric, >> >> Spearman's rank correlation coefficient will work in this case, I think. >> https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient >> >> Best, >> Olga >> >> 14.10.2016, 09:54, "Eric Atwell" <e.s.atw...@leeds.ac.uk>: >> >>> Is there a standard metric of overlap between two ranked lists? >>> e.g. to measure/score the similarity between top 10 keywords extracted >>> using 2 different formulae, such as LL v MI? >>> OR e.g. to measure/score the similarity between top 10 hits from Google >>> v top 10 hits from Bing for a give search phrase? >>> OR e.g. to measure/score the similarity between ranked lists of PoS-tags >>> predicted for a word by two rival PoS-taggers in an ensemble tagger? >>> >>> If these were unranked sets of keywords, i could simply count the >>> intersection. But I want to take rank into account in some senible way. >>> >>> thanks for expert pointers to proven metrics ... >>> >>> Eric Atwell, Asst Prof, Language@Leeds and Artificial Intelligence >>> groups, >>> School of Computing, University of Leeds, Times University of the Year >>> 2017 >>> >>> _______________________________________________ >>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora >>> Corpora mailing list >>> Corpora@uib.no >>> http://mailman.uib.no/listinfo/corpora >>> >> >> Olga Lyashevskaya >> >> School of Linguistics, Faculty of Humanities >> Higher School of Economics, Moscow >> >> >> _______________________________________________ >> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora >> Corpora mailing list >> Corpora@uib.no >> http://mailman.uib.no/listinfo/corpora >> >> > -- > Département des Sciences du Langage & Laboratoire CLLE-ERSS (UMR 5263) > Université de Toulouse 2 > 5, allées Antonio Machado F-31058 Toulouse CEDEX 9 > Tél : (+33) 5 61 50 36 03 - Fax : (+33) 5 61 50 46 77 > http://w3.erss.univ-tlse2.fr/membre/tanguy/ > > > _______________________________________________ > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora > Corpora mailing list > Corpora@uib.no > http://mailman.uib.no/listinfo/corpora >

_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora@uib.no http://mailman.uib.no/listinfo/corpora