On Fri, Nov 13, 2015 at 01:45:57PM -0800, Erik Bernhardson wrote:

> Have you put any thought into normalizing page view data?

I haven't studied it, but I think you've got a good start: normalizing
them by the # of pageviews of the community. So if someone types an
entire French phrase into the English wikipedia, and you wanted to
show both En and Fr options in the autocomplete, a simple
normalization would be a good start for having something to sort
by. Ditto for search.

Your next question, about weighting over time, is really a question
about how much data you have. It's nice to be able to push up current
events, so that someone searching for Paris today could see (alas) the
brand new article about today's attacks. But it's the amount of
pageview data that really dictates how well you can do that. For the
English wikipedia, there are so many pageviews that you probably have
enough data over 24 hours to produce good, not-noisy counts. And for
less than 24 hours, you'll probably end up magnifying Europe's
favorites as America wakes up, and America's favorites as Asia wakes
up. Probably not a good thing!

For a less-used wiki, only 24 hours might produce pretty sparse and
noisy counts. So you will need to look back farther, which reduces
your ability to react to current events.

You'd like to experiment with exponential decay, you can look at the
statistics to try to figure out if you're just magnifying noise. Or
Europe's preferences become popular when Americans wake up.

(And if you're really interested in geography, you could divide the
data up so that Europe, America, ANZ, Asia, etc have separate
autocompletes... if you have enough pageview data.)

-- greg


_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to