
is there some documentation / further reading available on the machine
ranking used for Wikipedia? This sounds very interesting!

And can you elaborate on how the aggregated search queries are PII?

Thank you!

Georg Sorst <> schrieb am Mo., 5. März 2018 um
20:31 Uhr:

> Hi all,
> sorry for this messy post - I forgot to subscribe to the list so I can't
> directly reply to your responses.
> Nuria:
> > Datasets do not include simple wiki, there are calculated for a few wikis
> some or which are not very large so you might be able to use them.
> Is the raw data available? Can I compute the clickstream myself?
> Erik:
> > This is actually how our production search ranking is built for around
> the
> top 20 sites by search volume that we host. Simple wikipedia isn't one of
> those we currently use machine ranking for though.
> Awesome! Is there more info available somewhere? Algorithms used etc.
> maybe even source code?
> > Because of that we do have the data you need, but the problem will be
> that the actual search
> queries are considered PII (Personally Identifiable Information) and not
> something I can release publicly. It may be possible to release aggregated
> data sets that don't include the actual search terms, but at that point I
> don't think the data will be useful to you anymore.
> I think I'm fine with query-document pairs. Isn't that sufficiently
> aggregated to not be considered PII?
> Thank you!
> Georg
> Georg Sorst <> schrieb am Mi., 28. Feb. 2018 um
> 12:17 Uhr:
>> Hi list,
>> as part of a lecture on Information Retrieval I am giving we work a lot
>> with Simple Wikipedia articles. It's a great data set because it's
>> comprehensive and not domain specific so when building search on top of it
>> humans can easily judge result quality, and it's still small enough to be
>> handled by a regular computer.
>> This year I want to cover the topic of Machine Learning for search. The
>> idea is to look at result clicks from an internal search search engine,
>> feed that into the Machine Learning and adjust search accordingly so that
>> the top-clicked results actually rank best. We will be using Solr LTR for
>> this purpose.
>> I would love to base this on Simple Wikipedia data since it would fit
>> well into the rest of the lecture. Unfortunately, I could not find that
>> data. The closest I came is
>> but this
>> covers neither Simple Wikipedia nor does it specify internal search queries.
>> Did I miss something? Is this data available somewhere? Can I produce it
>> myself from raw data? Ideally I would need (query-document) pairs with the
>> number of occurrences.
>> Thank you!
>> Georg
>> --
>> *Georg M. Sorst I CTO*
>> [image: FINDOLOGIC Logo]
>> Jakob-Haringer-Str. 5a | 5020
>> <>
>>  Salzburg
>> <>
>> I T.: +43 662 456708 <+43%20662%20456708>
>> E.:
>> Folgen Sie uns auf: XING
>> <> facebook
>> <> Twitter
>> <>
>> Wir sehen uns auf der* Internet World* - am 06.03. & 07.03.2018 in *Halle
>> A6 Stand E130 in München*! Hier
>> <> Termin
>> vereinbaren!
>> Wir sehen uns auf der *SHOPTALK* von 18. bis 21. März in *Las Vegas*!
>> Hier <> Termin
>> vereinbaren!
>> Wir sehen uns auf der *SOM* am 18.04. & 19.04.2018 in *Halle 7 Stand
>> G.17 in Zürich*! Hier <> 
>> Termin
>> vereinbaren!
>> Hier <> geht es zu unserer *Homepage*!
> --
> *Georg M. Sorst I CTO*
> [image: FINDOLOGIC Logo]
> Jakob-Haringer-Str. 5a | 5020
> <>
>  Salzburg
> <>
> I T.: +43 662 456708 <+43%20662%20456708>
> E.:
> Folgen Sie uns auf: XING
> <> facebook
> <> Twitter
> <>
> Wir sehen uns auf der* Internet World* - am 06.03. & 07.03.2018 in *Halle
> A6 Stand E130 in München*! Hier
> <> Termin
> vereinbaren!
> Wir sehen uns auf der *SHOPTALK* von 18. bis 21. März in *Las Vegas*! Hier
> <> Termin
> vereinbaren!
> Wir sehen uns auf der *SOM* am 18.04. & 19.04.2018 in *Halle 7 Stand G.17
> in Zürich*! Hier <> Termin
> vereinbaren!
> Hier <> geht es zu unserer *Homepage*!
*Georg M. Sorst I CTO*

Jakob-Haringer-Str. 5a | 5020 Salzburg I T.: +43 662 456708
E.: Folgen Sie uns auf: XING
<> Twitter

Wir sehen uns auf der *SHOPTALK* von 18. bis 21.03 in *Las Vegas*! Hier
<> Termin vereinbaren!
Wir sehen uns auf der *SOM* am 18.04. & 19.04. in *Halle 7 Stand G.17 in
Zürich*! Hier <> Termin
Wir sehen uns auf dem *SHOPWARE Community Day* am 18.05.* in Duisburg*! Hier
<> Termin
Wir sehen uns auf der *OXID Commons* am 14.06. *in Freiburg*! Hier
<> Termin vereinbaren!
Hier <> geht es zu unserer *Homepage*!
Analytics mailing list

Reply via email to