Re: [discovery] [AI] Collecting human labeled relevance judgements for search from readers

Jonathan Morgan Wed, 03 May 2017 12:44:52 -0700

Hi Erik,

I've been using some similar methods to evaluate Related Article
recommendations
<https://meta.wikimedia.org/wiki/Research:Evaluating_RelatedArticles_recommendations>
and the source of the trending article card
<https://meta.wikimedia.org/wiki/Research:Comparing_most_read_and_trending_edits_for_Top_Articles_feature>
in the Explore feed on Android. Let me know if you'd like to sit down and
chat about experimental design sometime.


- J

On Wed, May 3, 2017 at 12:24 PM, Erik Bernhardson <
[email protected]> wrote:

> At our weekly relevance meeting an interesting idea came up about how to
> collect relevance judgements for the long tail of queries, which make up
> around 60% of search sessions.
>
> We are pondering asking questions on the article pages themselves. Roughly
> we would manually curate some list of queries we want to collect relevance
> judgements for. When a user has spent some threshold of time (60s?) on a
> page we would, for some % of users, check if we have any queries we want
> labeled for this page, and then ask them if the page is a relevant result
> for that query. In this way the amount of work asked of individuals is
> relatively low and hopefully something they can answer without too much
> work. We know that the average page receives a few thousand page views per
> day, so even with a relatively low response rate we could probably collect
> a reasonable number of judgements over some medium length time period
> (weeks?)
>
> These labels would almost certainly be noisy, we would need to collect the
> same judgement many times to get any kind of certainty on the label.
> Additionally we would not be able to really explain the nuances of a
> grading scale with many points, we would probably have to use either a
> thumbs up/thumbs down approach, or maybe a happy/sad/indifferent smiley
> face.
>
> Does this seem reasonable? Are there other ways we could go about
> collecting the same data? How to design it in a non-intrusive manner that
> gets results, but doesn't annoy users? Other thoughts?
>
>
> For some background:
>
> * We are currently generating labeled data using statistical analysis
> (clickmodels) against historical click data. This analysis requires there
> to be multiple search sessions with the same query presented with similar
> results to estimate the relevance of those results. A manual review of the
> results showed queries with clicks from at least 10 sessions had reasonable
> but not great labels, queries with 35+ sessions looked pretty good, and
> queries with hundreds of sessions were labeled really well.
>
> * an analysis of 80 days worth of search click logs showed that 35 to 40%
> of search sessions are for queries that are repeated more than 10 times in
> that 80 day period. Around 20% of search session are for queries that are
> repeated more than 35 times in that 80 day period. (https://phabricator.
> wikimedia.org/P5371)
>
> * Our privacy policy prevents us from keeping more than 90 days worth of
> data from which to run these clickmodels. Practically 80 days is probably a
> reasonable cutoff, as we will want to re-use the data multiple times before
> needing to delete it and generate a new set of labels.
>
> * We currently collect human relevance judgements with Discernatron (
> https://discernatron.wmflabs.org/). This is useful data for manual
> evaluation of changes, but the data set is much too small (low hundreds of
> queries, with an average of 50 documents per query) to integrate into
> machine learning. The process of judging query/document pairs for the
> community is quite tedious, and it doesn't seem like a great use of
> engineer time for us to do this ourselves.
>
> _______________________________________________
> AI mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/ai
>
>


-- 
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>

_______________________________________________
discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Re: [discovery] [AI] Collecting human labeled relevance judgements for search from readers

Reply via email to