Hi Dan / Marco, Let me take a look and see if this will be good enough but it looks promising! If you don't hear from me again, all is well :)
Thanks! .......................................................................... Srdjan Grubor | +1.314.540.8328 | Endless <http://endlessm.com/> On Mon, Apr 2, 2018 at 9:47 AM, Dan Andreescu <[email protected]> wrote: > Hi Srdjan, > > The data pipeline behind the API can't handle arbitrary skip or limit > parameters, but there's a better way for the kind of question you have. We > publish all the pageviews at https://dumps.wikimedia.org/ > other/pagecounts-ez/, look at the "Hourly page views per article" > section. I would imagine for your use case one month of data is enough, > and you can get the top N articles for all wikis this way, where N is > anything you want. These files are compressed, so when you process and > expand the data you'll see the reason we can't do this dynamically: it's > huge data and our cluster is limited. > > On Sun, Apr 1, 2018 at 11:51 AM, Marko Obrovac <[email protected]> > wrote: > >> (+Analytics-l) >> >> Hello Srdjan, >> >> The 1k limit is a hard one: only the top 1000 articles for a given day >> get loaded into the database. I added the folks from the Analytics team to >> this thread, they may be able to help you, as they generate and expose the >> data in question. >> >> >> Cheers, >> Marko Obrovac, PhD >> Senior Services Engineer >> Wikimedia Foundation >> >> >> On 30 March 2018 at 16:59, Srdjan Grubor <[email protected]> wrote: >> >>> Heya, >>> I asked this on IRC but didn't get any replies so I'm following it up >>> this way. >>> I have a question about the newer metrics REST v1 API: is there a way to >>> specify how many top articles to pull from >>> https://wikimedia.org/api/rest_v1/#!/Pageviews_data/get_metr >>> ics_pageviews_top_project_access_year_month_day or is 1k hardcoded? Old >>> metrics data was available that had the most viewed pages but that >>> disappeared with the change to the new API. >>> >>> The reason I ask is because we (https://endlessos.com) are trying to >>> rebuild our stale encyclopedia apps for offline usage but are space-limited >>> and would only like to include the most likely pages that would be looked >>> at that can fit within a size envelope that varies with the device in >>> question (up to 100k article limit probably) but the new API doesn't >>> provide us with the tools to figure out the rankings cleanly (other than >>> rate-limiting on our side and checking every single article's metric >>> endpoint for counts). >>> >>> So the main question is: do we have a way to get this data out with the >>> current API? If this data is not available, can the " >>> metrics/pageviews/top" API be augmented to maybe have a `skip` and/or ` >>> limit` params like other similar services that have this type of >>> filtering? >>> >>> Thanks, >>> >>> ............................................................ >>> .............. >>> >>> Srdjan Grubor | +1.314.540.8328 <(314)%20540-8328> | Endless >>> <http://endlessm.com/> >>> >>> _______________________________________________ >>> Services mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/services >>> >>> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
