>Did I miss something? Is this data available somewhere?
You can find more information about click streams datasets here:
https://blog.wikimedia.org/2018/01/16/wikipedia-rabbit-hole-clickstream/

Datasets do not include simple wiki, there are calculated for a few wikis
some or which are not very large so you might be able to use them.







On Wed, Feb 28, 2018 at 3:17 AM, Georg Sorst <g.so...@findologic.com> wrote:

> Hi list,
>
> as part of a lecture on Information Retrieval I am giving we work a lot
> with Simple Wikipedia articles. It's a great data set because it's
> comprehensive and not domain specific so when building search on top of it
> humans can easily judge result quality, and it's still small enough to be
> handled by a regular computer.
>
> This year I want to cover the topic of Machine Learning for search. The
> idea is to look at result clicks from an internal search search engine,
> feed that into the Machine Learning and adjust search accordingly so that
> the top-clicked results actually rank best. We will be using Solr LTR for
> this purpose.
>
> I would love to base this on Simple Wikipedia data since it would fit well
> into the rest of the lecture. Unfortunately, I could not find that data.
> The closest I came is https://meta.wikimedia.org/wiki/Research:Wikipedia_
> clickstream but this covers neither Simple Wikipedia nor does it specify
> internal search queries.
>
> Did I miss something? Is this data available somewhere? Can I produce it
> myself from raw data? Ideally I would need (query-document) pairs with the
> number of occurrences.
>
> Thank you!
> Georg
> --
> *Georg M. Sorst I CTO*
> [image: FINDOLOGIC Logo]
>
> Jakob-Haringer-Str. 5a | 5020
> <https://maps.google.com/?q=Jakob-Haringer-Str.+5a+%7C+5020%C2%A0Salzburg&entry=gmail&source=g>
>  Salzburg
> <https://maps.google.com/?q=Jakob-Haringer-Str.+5a+%7C+5020%C2%A0Salzburg&entry=gmail&source=g>
> I T.: +43 662 456708 <+43%20662%20456708>
> E.: g.so...@findologic.com
> www.findologic.com Folgen Sie uns auf: XING
> <https://www.xing.com/profile/Georg_Sorst> facebook
> <http://www.facebook.com/Findologic/> Twitter
> <https://twitter.com/findologic>
>
> Wir sehen uns auf der* Internet World* - am 06.03. & 07.03.2018 in *Halle
> A6 Stand E130 in München*! Hier
> <berat...@findologic.com?subject=Internet%20World%20M%C3%BCnchen> Termin
> vereinbaren!
> Wir sehen uns auf der *SHOPTALK* von 18. bis 21. März in *Las Vegas*! Hier
> <berat...@findologic.com?subject=SHOPTALK%20Las%20Vegas> Termin
> vereinbaren!
> Wir sehen uns auf der *SOM* am 18.04. & 19.04.2018 in *Halle 7 Stand G.17
> in Zürich*! Hier <berat...@findologic.com?subject=SOM%20Z%C3%BCrich> Termin
> vereinbaren!
> Hier <http://www.findologic.com> geht es zu unserer *Homepage*!
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to