Hi Erik >From my understanding, it looks like your looking to collect relevance data "in reverse". Typically, for this type of data collection, I would assume that you'd present a query with some search results, and ask users "which results are relevant to this query" (which is what discernatron does, at a very high effort level).
What I think your proposing instead is that when a user visits an article, we present them with a question that asks "would this search query be relevant to the article you are looking at". I can see this working, provided that the query is controlled and the question is *not* phrased like it is above. I think that for this to work, the question should be phrased in a way that elicits a simple "top-level" (maybe "yes" or "no") response. For example, the question "*is this page about*: 'hydrostone halifax nova scotia' " can be responded to with a thumbs up 👍 or thumbs down 👎, but a question like "is this article relevant to the following query: ..." seems more complicated 🤔 . On Thu, May 4, 2017 at 6:29 PM, Erik Bernhardson <[email protected] > wrote: > On Wed, May 3, 2017 at 12:44 PM, Jonathan Morgan <[email protected]> > wrote: > >> Hi Erik, >> >> I've been using some similar methods to evaluate Related Article >> recommendations >> <https://meta.wikimedia.org/wiki/Research:Evaluating_RelatedArticles_recommendations> >> and the source of the trending article card >> <https://meta.wikimedia.org/wiki/Research:Comparing_most_read_and_trending_edits_for_Top_Articles_feature> >> in the Explore feed on Android. Let me know if you'd like to sit down and >> chat about experimental design sometime. >> >> - J >> >> > This might be useful. I'll see if i can find a time on both our calendars. > I should note though this is explicitly not about experimental design. The > data is not going to be used for experimental purposes, but rather to feed > into a machine learning pipeline that will re-order search results to > provide the best results at the top of the list. For the purpose of > ensuring the long tail is represented in the training data for this model I > would like to have a few tens of thousands of labels for (query, page) > combinations each month. The relevance of pages to a query does have some > temporal aspect, so we would likely want to only use the last N months > worth of data (TBD). > > On Wed, May 3, 2017 at 12:24 PM, Erik Bernhardson < >> [email protected]> wrote: >> >>> At our weekly relevance meeting an interesting idea came up about how to >>> collect relevance judgements for the long tail of queries, which make up >>> around 60% of search sessions. >>> >>> We are pondering asking questions on the article pages themselves. >>> Roughly we would manually curate some list of queries we want to collect >>> relevance judgements for. When a user has spent some threshold of time >>> (60s?) on a page we would, for some % of users, check if we have any >>> queries we want labeled for this page, and then ask them if the page is a >>> relevant result for that query. In this way the amount of work asked of >>> individuals is relatively low and hopefully something they can answer >>> without too much work. We know that the average page receives a few >>> thousand page views per day, so even with a relatively low response rate we >>> could probably collect a reasonable number of judgements over some medium >>> length time period (weeks?) >>> >>> These labels would almost certainly be noisy, we would need to collect >>> the same judgement many times to get any kind of certainty on the label. >>> Additionally we would not be able to really explain the nuances of a >>> grading scale with many points, we would probably have to use either a >>> thumbs up/thumbs down approach, or maybe a happy/sad/indifferent smiley >>> face. >>> >>> Does this seem reasonable? Are there other ways we could go about >>> collecting the same data? How to design it in a non-intrusive manner that >>> gets results, but doesn't annoy users? Other thoughts? >>> >>> >>> For some background: >>> >>> * We are currently generating labeled data using statistical analysis >>> (clickmodels) against historical click data. This analysis requires there >>> to be multiple search sessions with the same query presented with similar >>> results to estimate the relevance of those results. A manual review of the >>> results showed queries with clicks from at least 10 sessions had reasonable >>> but not great labels, queries with 35+ sessions looked pretty good, and >>> queries with hundreds of sessions were labeled really well. >>> >>> * an analysis of 80 days worth of search click logs showed that 35 to >>> 40% of search sessions are for queries that are repeated more than 10 times >>> in that 80 day period. Around 20% of search session are for queries that >>> are repeated more than 35 times in that 80 day period. ( >>> https://phabricator.wikimedia.org/P5371) >>> >>> * Our privacy policy prevents us from keeping more than 90 days worth of >>> data from which to run these clickmodels. Practically 80 days is probably a >>> reasonable cutoff, as we will want to re-use the data multiple times before >>> needing to delete it and generate a new set of labels. >>> >>> * We currently collect human relevance judgements with Discernatron ( >>> https://discernatron.wmflabs.org/). This is useful data for manual >>> evaluation of changes, but the data set is much too small (low hundreds of >>> queries, with an average of 50 documents per query) to integrate into >>> machine learning. The process of judging query/document pairs for the >>> community is quite tedious, and it doesn't seem like a great use of >>> engineer time for us to do this ourselves. >>> >>> _______________________________________________ >>> AI mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/ai >>> >>> >> >> >> -- >> Jonathan T. Morgan >> Senior Design Researcher >> Wikimedia Foundation >> User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> >> >> >> _______________________________________________ >> discovery mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/discovery >> >> > > _______________________________________________ > discovery mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/discovery > > -- Jan Drewniak UX Engineer, Discovery Wikimedia Foundation
_______________________________________________ discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
