Thanks for the links! This is exactly what I was looking for. After reviewing some of the options I'm going to do a first try with Krippendorff's Alpha. It's ability to handle missing data from some graders as well as being applicable down to n=2 seems promising.
On Oct 26, 2016 11:37 AM, "Justin Ormont" <[email protected]> wrote: > You're in the area of: https://en.wikipedia.org/wiki/ > Inter-rater_reliability > > --justin > > On Wed, Oct 26, 2016 at 11:31 AM, Jonathan Morgan <[email protected]> > wrote: > >> Disclaimer: I'm not a math nerd, and I don't know the history of >> Discernatron very well. >> >> ...but re: your second specialized concern, have you considered running >> some more sophisticated inter-rater reliability statistics to get a better >> sense of the degree of disagreement (controlling for random chance?). See >> for example: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3402032/ >> >> - Jonathan >> >> On Wed, Oct 26, 2016 at 11:21 AM, Erik Bernhardson < >> [email protected]> wrote: >> >>> For a little backstory, in discernatron multiple judges provide scores >>> in from 0 to 3 for results. Typically we only request a single query to be >>> reviewed by two judges. We would like to measure the level of disagreement >>> between these two judges, and if it crosses some threshold get two more >>> scores, so we can then measure disagreement in the group of 4. Somehow >>> though, we need to define how to measure that level of disagreement and >>> what the threshold for needing more scores is. >>> >>> Some specialized concerns: >>> * It is probably important to include not just that the users gave >>> different values, but also how far apart they are. The difference between a >>> 3 and a 2 is much smaller than between a 2 and a 0. >>> * If the users agree that 80% of the results are all 0, but disagree on >>> the last 20%, even though the average disagreement is low it's probably >>> still important? Might be worthwhile to take all the agreements about >>> irrelevant results and remove them before calculating disagreement? Not >>> sure... >>> >>> I know we have a few math nerds here on the list, so hoping someone has >>> a few ideas. >>> >>> _______________________________________________ >>> discovery mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/discovery >>> >>> >> >> >> -- >> Jonathan T. Morgan >> Senior Design Researcher >> Wikimedia Foundation >> User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> >> >> >> _______________________________________________ >> discovery mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/discovery >> >> > > _______________________________________________ > discovery mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/discovery > >
_______________________________________________ discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
