Hi Shilad, Thanks for this note.
This is super encouraging to see, given that a lot of my PhD work at Stanford has been on studying and leveraging users' Web (and in particular Wikipedia) navigation patterns, and on preaching the value hidden in navigation logs... ;) We've found something similar for the task of link recommendation: here, too, recommendations mined from server logs outperform purely content-based methods: https://dlab.epfl.ch/people/west/pub/Paranjape-West-Leskovec-Zia_WSDM-16.pdf https://dlab.epfl.ch/people/west/pub/West-Paranjape-Leskovec_WWW-15.pdf Curious to see how your line of research continues! Bob On Sun, Jun 18, 2017 at 3:51 PM, Shilad Sen <[email protected]> wrote: > Hi Everybody, > > I just ran an experiment that surprised me and I thought folks on this list > would find interesting. > > tl;dr We found that navigation vector embeddings for articles (as produced > by Ellery Wulcyzn) outperform content-based vector embeddings (word2vec on > article text) by 62% vs 37% accuracy in a task-based user study. I've > volunteered to help with the engineering to productionize navigation > embedding and this study reinforces my eagerness to get navigation vectors > out in the world! > > More detail: The maps we use in Cartograph (cartograph.info) are almost > entirely built on "embedding" vectors for articles. We experimented with two > word2vec-based embeddings: content vectors mined from article text and link > structure, and navigation vectors mined from user browsing sessions. For the > latter, we used Ellery Wulczyn's navigation vectors. By staring at maps, our > intuition told us that the navigation vectors seemed better in "preference > spaces" where the human taste space wasn't necessarily easily encoded into > Wikipedia text. > > Last weekend we ran a Mechanical Turk experiment to test this intuition. We > created two Cartograph maps of movies: one built on navigation vectors and > one built on content vectors. We identified 40 relatively popular movies > that were not close neighbors in either map (i.e. cities that were not too > close to each other) and ran a Mechanical Turk study using the maps. > > For each Turker, we randomly selected 5 seen movies (out of the 30), and > asked them to evaluate maps for each movie. For each movie city, we showed > the map region around the city, but hid the city and asked them to guess the > city from a list of 12 movies they had seen (screenshot below). We added in > trivial validation questions using sequels to ensure Turkers were working in > good faith (show a map for "Rocky II" that had "Rocky" at the center). > > Result: Turkers exhibited 62% accuracy with the navigation vectors and 37% > accuracy with content vectors. We want to conduct several follow-up studies > to understand different subject areas and parameter settings and user tasks, > but the difference in performance was striking. > > Our study shows the value of navigation vectors and makes me super excited > to contribute to the engineering needed to get them out to the world on a > regular basis. Imagine if every researcher and practitioner who uses > word2vec now on Wikipedia content switches to navigation vectors. That's a > huge audience! > > Feedback and questions welcome! > > -Shilad > > > -- > Shilad W. Sen > > Associate Professor > Mathematics, Statistics, and Computer Science Dept. > Macalester College > > Senior Research Fellow, Target Corporation > > [email protected] > http://www.shilad.com > https://www.linkedin.com/in/shilad > 651-696-6273 > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
