Hi Erik, I've been using some similar methods to evaluate Related Article recommendations <https://meta.wikimedia.org/wiki/Research:Evaluating_RelatedArticles_recommendations> and the source of the trending article card <https://meta.wikimedia.org/wiki/Research:Comparing_most_read_and_trending_edits_for_Top_Articles_feature> in the Explore feed on Android. Let me know if you'd like to sit down and chat about experimental design sometime.
- J On Wed, May 3, 2017 at 12:24 PM, Erik Bernhardson < [email protected]> wrote: > At our weekly relevance meeting an interesting idea came up about how to > collect relevance judgements for the long tail of queries, which make up > around 60% of search sessions. > > We are pondering asking questions on the article pages themselves. Roughly > we would manually curate some list of queries we want to collect relevance > judgements for. When a user has spent some threshold of time (60s?) on a > page we would, for some % of users, check if we have any queries we want > labeled for this page, and then ask them if the page is a relevant result > for that query. In this way the amount of work asked of individuals is > relatively low and hopefully something they can answer without too much > work. We know that the average page receives a few thousand page views per > day, so even with a relatively low response rate we could probably collect > a reasonable number of judgements over some medium length time period > (weeks?) > > These labels would almost certainly be noisy, we would need to collect the > same judgement many times to get any kind of certainty on the label. > Additionally we would not be able to really explain the nuances of a > grading scale with many points, we would probably have to use either a > thumbs up/thumbs down approach, or maybe a happy/sad/indifferent smiley > face. > > Does this seem reasonable? Are there other ways we could go about > collecting the same data? How to design it in a non-intrusive manner that > gets results, but doesn't annoy users? Other thoughts? > > > For some background: > > * We are currently generating labeled data using statistical analysis > (clickmodels) against historical click data. This analysis requires there > to be multiple search sessions with the same query presented with similar > results to estimate the relevance of those results. A manual review of the > results showed queries with clicks from at least 10 sessions had reasonable > but not great labels, queries with 35+ sessions looked pretty good, and > queries with hundreds of sessions were labeled really well. > > * an analysis of 80 days worth of search click logs showed that 35 to 40% > of search sessions are for queries that are repeated more than 10 times in > that 80 day period. Around 20% of search session are for queries that are > repeated more than 35 times in that 80 day period. (https://phabricator. > wikimedia.org/P5371) > > * Our privacy policy prevents us from keeping more than 90 days worth of > data from which to run these clickmodels. Practically 80 days is probably a > reasonable cutoff, as we will want to re-use the data multiple times before > needing to delete it and generate a new set of labels. > > * We currently collect human relevance judgements with Discernatron ( > https://discernatron.wmflabs.org/). This is useful data for manual > evaluation of changes, but the data set is much too small (low hundreds of > queries, with an average of 50 documents per query) to integrate into > machine learning. The process of judging query/document pairs for the > community is quite tedious, and it doesn't seem like a great use of > engineer time for us to do this ourselves. > > _______________________________________________ > AI mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/ai > > -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________ discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
