It was great to meet you at IA yesterday, thanks for following up with this
link to your work. Very interesting and coincides with our own work on
using the completion suggester to replace the current prefix search used
on-wiki.

Have you put any thought into normalizing page view data? One thing we have
been trying to figure out (but on the back-burner as we focus on currently
quarterly goals) is how best to integrate page views (
https://phabricator.wikimedia.org/T112681). Because we have to do this
across many wiki's with a wide varience in page views, and we want to use
the data not only for the completion suggester but also within our full
text search results, we are thinking about normalizing the data down to a %
of page views for that wiki over a time period. Possiblying taking in a
larger time period of page views and weighting newer page views as more
important than older page views. Additionally we are looking into if we
should be batch loading page view information on a weekly basis, or if we
can load page view data only when pages are edited (or some combination of
the two). I've pinged david and trey with this and they might have some
questions for you :)

For comparison here is similar data but with a different scoring algorithm
david worked up that reuses the same data we use for rescoring full text
searches: https://en.wikipedia.org/w/api.php?action=cirrus-suggest&text=Que

We havn't yet put this into production because we wanted to integrate page
view data into the scoring before running more tests. It looks quite
promising based on your initial

On Fri, Nov 13, 2015 at 11:07 AM, Greg Lindahl <[email protected]> wrote:

> I've been working on book search at the Internet Archive, and I've
> been using Wikipedia article titles and redirects as entities and
> synonyms. I wanted to build autocomplete for this gizmo, so I
> downloaded 7 days of pageviews for the en Wikipedia, and wrote
> a tiny script to sum them up. It worked great!
>
> Here's the demo (currently live, will disappear eventually).
> "number" is the pageviews count.
>
> curl http://researcher3.fnf.archive.org:8080/autocomplete?q=Que | json_pp
> {
>    "autocomplete" : [
>       {
>          "number" : 68310,
>          "label" : "Queen Victoria"
>       },
>       {
>          "number" : 53283,
>          "label" : "Quentin Tarantino"
>       },
>       {
>          "number" : 29192,
>          "label" : "Quebec"
>       },
>       {
>          "number" : 23717,
>          "label" : "Queen Elizabeth The Queen Mother"
>       },
>       {
>          "number" : 20500,
>          "label" : "Quetiapine"
>       }
>    ]
> }
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to