Hi Dan / Marco,
Let me take a look and see if this will be good enough but it looks
promising! If you don't hear from me again, all is well :)

Thanks!

..........................................................................

Srdjan Grubor  |  +1.314.540.8328  |  Endless <http://endlessm.com/>

On Mon, Apr 2, 2018 at 9:47 AM, Dan Andreescu <[email protected]>
wrote:

> Hi Srdjan,
>
> The data pipeline behind the API can't handle arbitrary skip or limit
> parameters, but there's a better way for the kind of question you have.  We
> publish all the pageviews at https://dumps.wikimedia.org/
> other/pagecounts-ez/, look at the "Hourly page views per article"
> section.  I would imagine for your use case one month of data is enough,
> and you can get the top N articles for all wikis this way, where N is
> anything you want.  These files are compressed, so when you process and
> expand the data you'll see the reason we can't do this dynamically: it's
> huge data and our cluster is limited.
>
> On Sun, Apr 1, 2018 at 11:51 AM, Marko Obrovac <[email protected]>
> wrote:
>
>> (+Analytics-l)
>>
>> Hello Srdjan,
>>
>> The 1k limit is a hard one: only the top 1000 articles for a given day
>> get loaded into the database. I added the folks from the Analytics team to
>> this thread, they may be able to help you, as they generate and expose the
>> data in question.
>>
>>
>> Cheers,
>> Marko Obrovac, PhD
>> Senior Services Engineer
>> Wikimedia Foundation
>>
>>
>> On 30 March 2018 at 16:59, Srdjan Grubor <[email protected]> wrote:
>>
>>> Heya,
>>> I asked this on IRC but didn't get any replies so I'm following it up
>>> this way.
>>> I have a question about the newer metrics REST v1 API: is there a way to
>>> specify how many top articles to pull from
>>> https://wikimedia.org/api/rest_v1/#!/Pageviews_data/get_metr
>>> ics_pageviews_top_project_access_year_month_day or is 1k hardcoded? Old
>>> metrics data was available that had the most viewed pages but that
>>> disappeared with the change to the new API.
>>>
>>> The reason I ask is because we (https://endlessos.com) are trying to
>>> rebuild our stale encyclopedia apps for offline usage but are space-limited
>>> and would only like to include the most likely pages that would be looked
>>> at that can fit within a size envelope that varies with the device in
>>> question (up to 100k article limit probably) but the new API doesn't
>>> provide us with the tools to figure out the rankings cleanly (other than
>>> rate-limiting on our side and checking every single article's metric
>>> endpoint for counts).
>>>
>>> So the main question is: do we have a way to get this data out with the
>>> current API? If this data is not available, can the "
>>> metrics/pageviews/top" API be augmented to maybe have a `skip` and/or `
>>> limit` params like other similar services that have this type of
>>> filtering?
>>>
>>> Thanks,
>>>
>>> ............................................................
>>> ..............
>>>
>>> Srdjan Grubor  |  +1.314.540.8328 <(314)%20540-8328>  |  Endless
>>> <http://endlessm.com/>
>>>
>>> _______________________________________________
>>> Services mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/services
>>>
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to