>Did I miss something? Is this data available somewhere? You can find more information about click streams datasets here: https://blog.wikimedia.org/2018/01/16/wikipedia-rabbit-hole-clickstream/
Datasets do not include simple wiki, there are calculated for a few wikis some or which are not very large so you might be able to use them. On Wed, Feb 28, 2018 at 3:17 AM, Georg Sorst <[email protected]> wrote: > Hi list, > > as part of a lecture on Information Retrieval I am giving we work a lot > with Simple Wikipedia articles. It's a great data set because it's > comprehensive and not domain specific so when building search on top of it > humans can easily judge result quality, and it's still small enough to be > handled by a regular computer. > > This year I want to cover the topic of Machine Learning for search. The > idea is to look at result clicks from an internal search search engine, > feed that into the Machine Learning and adjust search accordingly so that > the top-clicked results actually rank best. We will be using Solr LTR for > this purpose. > > I would love to base this on Simple Wikipedia data since it would fit well > into the rest of the lecture. Unfortunately, I could not find that data. > The closest I came is https://meta.wikimedia.org/wiki/Research:Wikipedia_ > clickstream but this covers neither Simple Wikipedia nor does it specify > internal search queries. > > Did I miss something? Is this data available somewhere? Can I produce it > myself from raw data? Ideally I would need (query-document) pairs with the > number of occurrences. > > Thank you! > Georg > -- > *Georg M. Sorst I CTO* > [image: FINDOLOGIC Logo] > > Jakob-Haringer-Str. 5a | 5020 > <https://maps.google.com/?q=Jakob-Haringer-Str.+5a+%7C+5020%C2%A0Salzburg&entry=gmail&source=g> > Salzburg > <https://maps.google.com/?q=Jakob-Haringer-Str.+5a+%7C+5020%C2%A0Salzburg&entry=gmail&source=g> > I T.: +43 662 456708 <+43%20662%20456708> > E.: [email protected] > www.findologic.com Folgen Sie uns auf: XING > <https://www.xing.com/profile/Georg_Sorst> facebook > <http://www.facebook.com/Findologic/> Twitter > <https://twitter.com/findologic> > > Wir sehen uns auf der* Internet World* - am 06.03. & 07.03.2018 in *Halle > A6 Stand E130 in München*! Hier > <[email protected]?subject=Internet%20World%20M%C3%BCnchen> Termin > vereinbaren! > Wir sehen uns auf der *SHOPTALK* von 18. bis 21. März in *Las Vegas*! Hier > <[email protected]?subject=SHOPTALK%20Las%20Vegas> Termin > vereinbaren! > Wir sehen uns auf der *SOM* am 18.04. & 19.04.2018 in *Halle 7 Stand G.17 > in Zürich*! Hier <[email protected]?subject=SOM%20Z%C3%BCrich> Termin > vereinbaren! > Hier <http://www.findologic.com> geht es zu unserer *Homepage*! > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
