Hi Shilad,

Thanks for this note.

This is super encouraging to see, given that a lot of my PhD work at
Stanford has been on studying and leveraging users' Web (and in
particular Wikipedia) navigation patterns, and on preaching the value
hidden in navigation logs... ;)

We've found something similar for the task of link recommendation:
here, too, recommendations mined from server logs outperform purely
content-based methods:
https://dlab.epfl.ch/people/west/pub/Paranjape-West-Leskovec-Zia_WSDM-16.pdf
https://dlab.epfl.ch/people/west/pub/West-Paranjape-Leskovec_WWW-15.pdf

Curious to see how your line of research continues!

Bob

On Sun, Jun 18, 2017 at 3:51 PM, Shilad Sen <[email protected]> wrote:
> Hi Everybody,
>
> I just ran an experiment that surprised me and I thought folks on this list
> would find interesting.
>
> tl;dr We found that navigation vector embeddings for articles (as produced
> by Ellery Wulcyzn) outperform content-based vector embeddings (word2vec on
> article text) by 62% vs 37% accuracy in a task-based user study.  I've
> volunteered to help with the engineering to productionize navigation
> embedding and this study reinforces my eagerness to get navigation vectors
> out in the world!
>
> More detail:  The maps we use in Cartograph (cartograph.info) are almost
> entirely built on "embedding" vectors for articles. We experimented with two
> word2vec-based embeddings: content vectors mined from article text and link
> structure, and navigation vectors mined from user browsing sessions. For the
> latter, we used Ellery Wulczyn's navigation vectors. By staring at maps, our
> intuition told us that the navigation vectors seemed better in "preference
> spaces" where the human taste space wasn't necessarily easily encoded into
> Wikipedia text.
>
> Last weekend we ran a Mechanical Turk experiment to test this intuition. We
> created two Cartograph maps of movies: one built on navigation vectors and
> one built on content vectors. We identified 40 relatively popular movies
> that were not close neighbors in either map (i.e. cities that were not too
> close to each other) and ran a Mechanical Turk study using the maps.
>
> For each Turker, we randomly selected 5 seen movies (out of the 30), and
> asked them to evaluate maps for each movie. For each movie city, we showed
> the map region around the city, but hid the city and asked them to guess the
> city from a list of 12 movies they had seen (screenshot below). We added in
> trivial validation questions using sequels to ensure Turkers were working in
> good faith (show a map for "Rocky II" that had "Rocky" at the center).
>
> Result: Turkers exhibited 62% accuracy with the navigation vectors and 37%
> accuracy with content vectors. We want to conduct several follow-up studies
> to understand different subject areas and parameter settings and user tasks,
> but the difference in performance was striking.
>
> Our study shows the value of navigation vectors and makes me super excited
> to contribute to the engineering needed to get them out to the world on a
> regular basis. Imagine if every researcher and practitioner who uses
> word2vec now on Wikipedia content switches to navigation vectors. That's a
> huge audience!
>
> Feedback and questions welcome!
>
> -Shilad
>
>
> --
> Shilad W. Sen
>
> Associate Professor
> Mathematics, Statistics, and Computer Science Dept.
> Macalester College
>
> Senior Research Fellow, Target Corporation
>
> [email protected]
> http://www.shilad.com
> https://www.linkedin.com/in/shilad
> 651-696-6273
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to