Hi Srdjan, The data pipeline behind the API can't handle arbitrary skip or limit parameters, but there's a better way for the kind of question you have. We publish all the pageviews at https://dumps.wikimedia.org/other/pagecounts-ez/, look at the "Hourly page views per article" section. I would imagine for your use case one month of data is enough, and you can get the top N articles for all wikis this way, where N is anything you want. These files are compressed, so when you process and expand the data you'll see the reason we can't do this dynamically: it's huge data and our cluster is limited.
On Sun, Apr 1, 2018 at 11:51 AM, Marko Obrovac <[email protected]> wrote: > (+Analytics-l) > > Hello Srdjan, > > The 1k limit is a hard one: only the top 1000 articles for a given day get > loaded into the database. I added the folks from the Analytics team to this > thread, they may be able to help you, as they generate and expose the data > in question. > > > Cheers, > Marko Obrovac, PhD > Senior Services Engineer > Wikimedia Foundation > > > On 30 March 2018 at 16:59, Srdjan Grubor <[email protected]> wrote: > >> Heya, >> I asked this on IRC but didn't get any replies so I'm following it up >> this way. >> I have a question about the newer metrics REST v1 API: is there a way to >> specify how many top articles to pull from https://wikimedia.org/api/rest >> _v1/#!/Pageviews_data/get_metrics_pageviews_top_project_acce >> ss_year_month_day or is 1k hardcoded? Old metrics data was available >> that had the most viewed pages but that disappeared with the change to the >> new API. >> >> The reason I ask is because we (https://endlessos.com) are trying to >> rebuild our stale encyclopedia apps for offline usage but are space-limited >> and would only like to include the most likely pages that would be looked >> at that can fit within a size envelope that varies with the device in >> question (up to 100k article limit probably) but the new API doesn't >> provide us with the tools to figure out the rankings cleanly (other than >> rate-limiting on our side and checking every single article's metric >> endpoint for counts). >> >> So the main question is: do we have a way to get this data out with the >> current API? If this data is not available, can the " >> metrics/pageviews/top" API be augmented to maybe have a `skip` and/or ` >> limit` params like other similar services that have this type of >> filtering? >> >> Thanks, >> >> ............................................................ >> .............. >> >> Srdjan Grubor | +1.314.540.8328 <(314)%20540-8328> | Endless >> <http://endlessm.com/> >> >> _______________________________________________ >> Services mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/services >> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
