It should be updated soon, the jobs are all done successfully.  But
currently we do expect this kind of lag, I'll explain why.

When we started we were sqooping at the beginning of the month and the
processing takes something like 4 days total, most of it sqooping.  But
this put too much load on the database serves too close to the beginning of
the month when a bunch of other stuff is running.  So we had to move it
back to the 5th of the month [1].  Add 4 days onto that and we end up
finishing around the 9th of the month.  We don't like this at all and we're
trying to figure out a better way to import the data incrementally so we
can just start processing when we have all of it.  It's unfortunate but we
couldn't foresee the infrastructure limitation, too much was up in the air
about even where we would sqoop from when we started this work.  Joseph and
I have a weekly meeting to discuss moving towards a more incremental
approach, and this task is the parent task to watch for now:
https://phabricator.wikimedia.org/T193650 (priority is low because we have
too many other commitments, but it's something I'd love to see before we
call wikistats 2 "production" quality)

[1]
https://github.com/wikimedia/puppet/blob/28b78985d3612a6e19720be1fe8eef5f0dfc2ed7/modules/profile/manifests/analytics/refinery/job/sqoop_mediawiki.pp#L43

On Wed, Oct 10, 2018 at 10:00 PM Neil Patel Quinn <[email protected]>
wrote:

> Hey there!
>
> I just wrote a script that fetches data from the AQS new pages endpoint
> <https://wikimedia.org/api/rest_v1/#!/Edited_pages_data/get_metrics_edited_pages_new_project_editor_type_page_type_granularity_start_end>
> in order to prepare the our monthly health metrics (T199459
> <https://phabricator.wikimedia.org/T199459>).
>
> However, it seems like that endpoint doesn't yet have monthly data for
> September. For example, a query for Commons with a start of July 1 and
> and an end of October 1
> <https://wikimedia.org/api/rest_v1/metrics/edited-pages/new/commons.wikimedia.org/all-editor-types/content/monthly/20180701/20181001>
> returns only data for July and August. What's the schedule for updating
> this data?
>
> To be honest, I feel pretty frustrated by this. Wikistats 1 generates data
> on content pages with a delay of 10-15 days after the end of the month,
> which has made it difficult for us to provide timely metrics to executives
> and the board. I had assumed (to a degree that I didn't even check) that by
> switching to this API, we would instead only have to deal with the delay in
> generating the mediawiki_history snapshot (5-7 days after the end of the
> month). But that doesn't seem to be the case.
> --
> Neil Patel Quinn <https://meta.wikimedia.org/wiki/User:Neil_P._Quinn-WMF>
> (he/him/his)
> product analyst, Wikimedia Foundation
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to