Eric,

For the exact problem of comparing word lists ordered by distributional similarity across different methods I used Rank-Biased Overlap, which is described here:


W. Webber, A. Moffat, J. Zobel, "A similarity measure for indefinite
rankings," ACM Transactions on Information Systems, Volume 28(4),
p. 20, 2010

In a nutshell, it consists in measuring the overlap between the two lists at each position, while introducing a bias based on the rank so that a difference further in the lists is less important than at the top.

I have some R code available if needed.

Ludovic Tanguy





On 14/10/2016 11:01, Eric Atwell wrote:
Thanks Olga, BUT I don't see how Spearman's (or other) rank correlation
formula measures overlap between 2 ranked lists containing some
different words.


For example, what is the similarity between 2 "distributional semantics"
representations of God?:

(1 move, 2 bless, 3 and, 4 forbid, 5 have, 6 be, 7 create, 8 do, 9 know,
10 want) and

(1 bless, 2 blesses, 3 delusion, 4 incarnate, 5 hates, 6 himself, 7
exists, 8 almighty, 9 forbid, 10 rest)


 A simple measure involves count of overlapping words (bless, forbid)
i.e. this scores 2 (or 2/10, or 2/20)

BUT I also want to take into account ranks: (bless 2,1), (forbid 4,9)


Does God shed light on my problems?


Eric


Eric Atwell, Asst Prof, Language@Leeds and Artificial Intelligence groups,
School of Computing, University of Leeds, Times University of the Year 2017

------------------------------------------------------------------------
*From:* Ольга Ляшевская <ole...@yandex.ru>
*Sent:* 14 October 2016 08:09:36
*To:* Eric Atwell; CORPORA discussion forum
*Cc:* AbdulRahman AlOsaimy; Claire Brierley
*Subject:* Re: [Corpora-List] metric of overlap between two ranked lists?

Dear Eric,

Spearman's rank correlation coefficient will work in this case, I think.
https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient

Best,
Olga

14.10.2016, 09:54, "Eric Atwell" <e.s.atw...@leeds.ac.uk>:
Is there a standard metric of overlap between two ranked lists?
e.g. to measure/score the similarity between top 10 keywords extracted
using 2 different formulae, such as LL v MI?
OR e.g. to measure/score the similarity between top 10 hits from Google
v top 10 hits from Bing for a give search phrase?
OR e.g. to measure/score the similarity between ranked lists of PoS-tags
predicted for a word by two rival PoS-taggers in an ensemble tagger?

If these were unranked sets of keywords, i could simply count the
intersection. But I want to take rank into account in some senible way.

thanks for expert pointers to proven metrics ...

Eric Atwell, Asst Prof, Language@Leeds and Artificial Intelligence groups,
School of Computing, University of Leeds, Times University of the Year 2017

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora@uib.no
http://mailman.uib.no/listinfo/corpora

Olga Lyashevskaya

School of Linguistics, Faculty of Humanities
 Higher School of Economics, Moscow


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora@uib.no
http://mailman.uib.no/listinfo/corpora


--
Département des Sciences du Langage & Laboratoire CLLE-ERSS (UMR 5263)
Université de Toulouse 2
5, allées Antonio Machado  F-31058 Toulouse CEDEX 9
Tél : (+33) 5 61 50 36 03 - Fax : (+33) 5 61 50 46 77
http://w3.erss.univ-tlse2.fr/membre/tanguy/

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora@uib.no
http://mailman.uib.no/listinfo/corpora

Reply via email to