We're still working out how to publish the dataset, I should have been more precise. We estimate the whole set would be something on the order of tens of GB. So one option might be to split it by wiki, and that's why it's nice to have a use case like yours. Would any particular split like by wiki, by year, etc. help with your analysis? And if you could provide other details about your work, that'd be useful for us.
On Mon, May 6, 2019 at 12:37 PM Celeste A Manughian-Peter < [email protected]> wrote: > Hi Dan, > > > > Thanks for reaching out. I would definitely be interested in the full data > set. How big is the file set and how would we obtain it? > > > > Thanks, > > Celeste > > > > > > *From:* Dan Andreescu <[email protected]> > *Sent:* Monday, May 06, 2019 8:26 AM > *To:* A mailing list for the Analytics Team at WMF and everybody who has > an interest in Wikipedia and analytics. <[email protected]> > *Cc:* Celeste A Manughian-Peter <[email protected]> > *Subject:* Re: [Analytics] WMF API update > > > > Celeste, thanks for writing to the list. Would you prefer a full dataset > instead of querying the API? We are planning on releasing the data behind > the API as a set of flat files, and it seems like that would be more useful > for consumers like you. Let us know if that's the case, and if not we'd > appreciate more detail about how you're accessing the data. > > > > Thank you! > > > > On Mon, May 6, 2019 at 10:13 AM Marcel Ruiz Forns <[email protected]> > wrote: > > Hi Celeste, > > Thanks for pinging us about that! > > > > I noticed the edits endpoint has been updated to limit the date range to > about 367 days’ worth of data per request. > > The limit on the time range length was placed on purpose to avoid > receiving very long requests. > > Requests of several years worth of data can acquire too much of the API > server power and block other requests. > > > > will I just have to request sequences of the shorter date ranges? > > Yes, please, sorry for the inconvenience! > > > > Cheers! > > > > On Sat, May 4, 2019 at 7:01 PM Celeste A Manughian-Peter < > [email protected]> wrote: > > Hello! > > > > I had set up some endpoints from the wikimedia API a while back and it has > been running smoothly in my project since then. I noticed the edits > endpoint has been updated to limit the date range to about 367 days’ worth > of data per request. > For example: > > https://wikimedia.org/api/rest_v1/metrics/edits/per-page/en.wikipedia/Europe/all-editor-types/monthly/20170502/20190502 > > vs. > > > https://wikimedia.org/api/rest_v1/metrics/edits/per-page/en.wikipedia/Europe/all-editor-types/monthly/20180502/20190502 > <https://wikimedia.org/api/rest_v1/metrics/edits/per-page/en.wikipedia/Europe/all-editor-types/monthly/20170502/20190502> > > > > Is there still a way to get longer periods of data through this endpoint > or will I just have to request sequences of the shorter date ranges? > > > > Thanks a bunch! > > Celeste > > > > > > Celeste Manughian-Peter > Data Science and Artificial Intelligence Department > The Aerospace Corporation > 310.336.6928 > > *[email protected] <[email protected]>* > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > -- > > *Marcel Ruiz Forns** (he/him)* > > Analytics Developer @ Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
