Short answer, no, this data is not available publicy such you can compute
the dataset yourself as it is Private data.

Thanks,

Nuria

On Mon, Mar 5, 2018 at 11:31 AM, Georg Sorst <[email protected]> wrote:

> Hi all,
>
> sorry for this messy post - I forgot to subscribe to the list so I can't
> directly reply to your responses.
>
> Nuria:
>
> > Datasets do not include simple wiki, there are calculated for a few wikis
> some or which are not very large so you might be able to use them.
>
> Is the raw data available? Can I compute the clickstream myself?
>
> Erik:
>
> > This is actually how our production search ranking is built for around
> the
> top 20 sites by search volume that we host. Simple wikipedia isn't one of
> those we currently use machine ranking for though.
>
> Awesome! Is there more info available somewhere? Algorithms used etc.
> maybe even source code?
>
> > Because of that we do have the data you need, but the problem will be
> that the actual search
> queries are considered PII (Personally Identifiable Information) and not
> something I can release publicly. It may be possible to release aggregated
> data sets that don't include the actual search terms, but at that point I
> don't think the data will be useful to you anymore.
>
> I think I'm fine with query-document pairs. Isn't that sufficiently
> aggregated to not be considered PII?
>
> Thank you!
> Georg
>
>
> Georg Sorst <[email protected]> schrieb am Mi., 28. Feb. 2018 um
> 12:17 Uhr:
>
>> Hi list,
>>
>> as part of a lecture on Information Retrieval I am giving we work a lot
>> with Simple Wikipedia articles. It's a great data set because it's
>> comprehensive and not domain specific so when building search on top of it
>> humans can easily judge result quality, and it's still small enough to be
>> handled by a regular computer.
>>
>> This year I want to cover the topic of Machine Learning for search. The
>> idea is to look at result clicks from an internal search search engine,
>> feed that into the Machine Learning and adjust search accordingly so that
>> the top-clicked results actually rank best. We will be using Solr LTR for
>> this purpose.
>>
>> I would love to base this on Simple Wikipedia data since it would fit
>> well into the rest of the lecture. Unfortunately, I could not find that
>> data. The closest I came is https://meta.wikimedia.org/
>> wiki/Research:Wikipedia_clickstream but this covers neither Simple
>> Wikipedia nor does it specify internal search queries.
>>
>> Did I miss something? Is this data available somewhere? Can I produce it
>> myself from raw data? Ideally I would need (query-document) pairs with the
>> number of occurrences.
>>
>> Thank you!
>> Georg
>> --
>> *Georg M. Sorst I CTO*
>> [image: FINDOLOGIC Logo]
>>
>> Jakob-Haringer-Str. 5a | 5020
>> <https://maps.google.com/?q=Jakob-Haringer-Str.+5a+%7C+5020%C2%A0Salzburg&entry=gmail&source=g>
>>  Salzburg
>> <https://maps.google.com/?q=Jakob-Haringer-Str.+5a+%7C+5020%C2%A0Salzburg&entry=gmail&source=g>
>> I T.: +43 662 456708 <+43%20662%20456708>
>> E.: [email protected]
>> www.findologic.com Folgen Sie uns auf: XING
>> <https://www.xing.com/profile/Georg_Sorst> facebook
>> <http://www.facebook.com/Findologic/> Twitter
>> <https://twitter.com/findologic>
>>
>> Wir sehen uns auf der* Internet World* - am 06.03. & 07.03.2018 in *Halle
>> A6 Stand E130 in München*! Hier
>> <[email protected]?subject=Internet%20World%20M%C3%BCnchen> Termin
>> vereinbaren!
>> Wir sehen uns auf der *SHOPTALK* von 18. bis 21. März in *Las Vegas*!
>> Hier <[email protected]?subject=SHOPTALK%20Las%20Vegas> Termin
>> vereinbaren!
>> Wir sehen uns auf der *SOM* am 18.04. & 19.04.2018 in *Halle 7 Stand
>> G.17 in Zürich*! Hier <[email protected]?subject=SOM%20Z%C3%BCrich> 
>> Termin
>> vereinbaren!
>> Hier <http://www.findologic.com> geht es zu unserer *Homepage*!
>>
> --
> *Georg M. Sorst I CTO*
> [image: FINDOLOGIC Logo]
>
> Jakob-Haringer-Str. 5a | 5020
> <https://maps.google.com/?q=Jakob-Haringer-Str.+5a+%7C+5020%C2%A0Salzburg&entry=gmail&source=g>
>  Salzburg
> <https://maps.google.com/?q=Jakob-Haringer-Str.+5a+%7C+5020%C2%A0Salzburg&entry=gmail&source=g>
> I T.: +43 662 456708 <+43%20662%20456708>
> E.: [email protected]
> www.findologic.com Folgen Sie uns auf: XING
> <https://www.xing.com/profile/Georg_Sorst> facebook
> <http://www.facebook.com/Findologic/> Twitter
> <https://twitter.com/findologic>
>
> Wir sehen uns auf der* Internet World* - am 06.03. & 07.03.2018 in *Halle
> A6 Stand E130 in München*! Hier
> <[email protected]?subject=Internet%20World%20M%C3%BCnchen> Termin
> vereinbaren!
> Wir sehen uns auf der *SHOPTALK* von 18. bis 21. März in *Las Vegas*! Hier
> <[email protected]?subject=SHOPTALK%20Las%20Vegas> Termin
> vereinbaren!
> Wir sehen uns auf der *SOM* am 18.04. & 19.04.2018 in *Halle 7 Stand G.17
> in Zürich*! Hier <[email protected]?subject=SOM%20Z%C3%BCrich> Termin
> vereinbaren!
> Hier <http://www.findologic.com> geht es zu unserer *Homepage*!
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to