What a long, strange trip it's been. Full write up here:

https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/How_Wrong_Would_Using_Out_of_Date_Page_View_Data_Be%3F

Summary:

   - We can't reliably catch day-by-day outliers by using the page view
   information that comes along with edits because not enough edits happen.
   - Weekly averages (rather than day-by-day counts) don't usually move
   that much (i.e., by more than a factor of 2). If we can capture daily or
   weekly page view stats, that should keep us reasonably up-to-date overall,
   esp. if these moderate swings don't affect scoring much.
   - We could gather daily statistics from the page view API and store the
   high mark over the last 3-7 for the top 1K to 50K most-viewed articles. The
   ranking algorithm could use either the rolling daily average or the high
   mark (which ever is higher).
   - For "Trending" topics, looking at the top 1K page views every hour
   (unfortunately not currently available through the PageviewAPI) would be
   the best way to catch suddenly trending topics if we want to be more
   responsive, but it isn't clear that it's worth it.


—Trey

Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
_______________________________________________
discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to