Ok, so if I wanted to get the entire revision history for a page, it would actually be cheaper to run parallel requests on the REST api than to do multi-revision requests on the action api? I can get up to 50 revisions per request on the action api and I kinda assumed that would be cheaper than getting only one for each request.
Thanks for the cc! Looking forward to hearing from them! Best, Bertel 2017-01-05 17:36 GMT+01:00 Gabriel Wicke <[email protected]>: > On Thu, Jan 5, 2017 at 6:20 AM, Bertel Teilfeldt Hansen < > [email protected]> wrote: > >> Hi Gabriel, >> >> Oh yeah, I see now that the REST api doesn't mind parallel requests. I >> was going off of the etiquette section in the documentation for the other >> api (https://www.mediawiki.org/wiki/API:Etiquette). That one prefers >> requests in series. >> >> Ah, ok - that caveat is actually quite relevant for my project. It >> requires all revisions of certain pages along with all revisions of their >> talk pages (along with a bunch of other stuff). So perhaps the REST api is >> not for me. I am not targeting especially frequently edited articles >> specifically; rather, I'm look at articles related to particular real-world >> conflicts (international and civil wars). I am a postdoc at Copenhagen >> University funded by the Danish government (grant information at the bottom >> of this page: http://ufm.dk/en/research-and- >> innovation/funding-programmes-for-research-and-innovation/wh >> o-has-received-funding/2015/postdoc-grants-from-the-danish- >> council-for-independent-research-social-sciences- >> february-2015?set_language=en&cl=en). Let me know if you want more >> identification or anything. >> > > > My concern was mainly about the overall volume of uncached requests. It > sounds like you are interested in is a fairly small subset of overall > pages, so I think this should be fine. Perhaps don't max out the > parallelism in this case. In any case, making the same requests to the > action API will result in even more on-demand parses, as only the very > latest revision is cached in that case. > > >> >> I actually have another question about the REST api, if that's ok. I'm >> using it to get page views over time for the pages that I'm interested in. >> However, the data don't seem to stretch very far back in time - is that >> correct? And if so, is there a better way of getting page views (short of >> using the raw files at https://dumps.wikimedia.org/other/pagecounts-raw/ >> )? >> > > Yes, the pageview API is relatively new, and only has recent data at this > point. I am not certain if the analytics team plans to back-fill more > historic data over time. I vaguely remember that there might be > difficulties with changes in what is considered a pageview, so the numbers > might not be completely comparable. I cc'ed Nuria and Dan from the > analytics team, who should be able to speak to this. > > > > >> Thanks for your help so far! >> >> Bertel >> >> >> >> >> 2017-01-03 19:25 GMT+01:00 Gabriel Wicke <[email protected]>: >> >>> Bertel, >>> >>> On Mon, Jan 2, 2017 at 7:40 AM, Bertel Teilfeldt Hansen < >>> [email protected]> wrote: >>> >>>> Hi Gabriel, >>>> >>>> The REST API looks promising - thank you! >>>> >>>> Having played around with it a bit, I seem to only be able to get one >>>> revision per request. Is that correct, or am I doing something wrong? >>>> >>> >>> >>> this is correct. The requests themselves are quite cheap, and can be >>> parallelized up to rate limit set out in the API documentation. >>> >>> >>> >>>> My project requires every revision and its references from a large >>>> number of articles, so that would make a lot of requests. The regular API >>>> allows for multiple revisions per request (only with action=query, though). >>>> >>> >>> >>> There is a caveat here in that we currently don't store all revisions >>> for all articles. This means that requests for really old revisions will >>> trigger a more expensive on-demand parse, just as with the action API. Can >>> you say more about the number of articles you are targeting, and how this >>> list is selected? Regarding the selection, I am mainly wondering if you are >>> targeting especially frequently edited articles. >>> >>> Thanks, >>> >>> Gabriel >>> >>> >>>> >>>> Thanks! >>>> >>>> Bertel >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> 2016-12-21 17:01 GMT+01:00 Gabriel Wicke <[email protected]>: >>>> >>>>> Bertel, another option is to use the REST API: >>>>> >>>>> >>>>> - HTML for a specific revision: https://en.wikipedia >>>>> .org/api/rest_v1/#!/Page_content/getFormatRevision >>>>> >>>>> <https://en.wikipedia.org/api/rest_v1/#!/Page_content/getFormatRevision> >>>>> - Within this HTML, references are marked up like this: >>>>> https://www.mediawiki.org/wiki/Specs/HTML/1.3.0/Extensions/Cite >>>>> <https://www.mediawiki.org/wiki/Specs/HTML/1.3.0/Extensions/Cite>. >>>>> Any HTML or XML DOM parser can be used to extract this information. >>>>> >>>>> Hope this helps, >>>>> >>>>> Gabriel >>>>> >>>>> On Wed, Dec 21, 2016 at 3:20 AM, Bertel Teilfeldt Hansen < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Brad and Gergo, >>>>>> >>>>>> Thanks for your responses! >>>>>> >>>>>> @Brad: Yeah, that was also my impression, but I wasn't sure. Seemed >>>>>> strange that the example in the official docs would point to a place >>>>>> where >>>>>> the feature was disabled. Thank you for clearing that up! >>>>>> >>>>>> @Gergo: I've been looking at action=parse, but as far as I understand >>>>>> it, it is limited to one revision per API request, which makes it quite >>>>>> slow to get a bunch of older revisions from a large number of articles. >>>>>> action=query&prop=revisions&rvprop=content omits the references from >>>>>> the output (just gives the string "{{reflist}}" after "References"). >>>>>> "mvrefs" sounds very promising, though! I will definitely check that out >>>>>> - >>>>>> thank you! >>>>>> >>>>>> Best, >>>>>> >>>>>> Bertel >>>>>> >>>>>> 2016-12-20 19:51 GMT+01:00 Gergo Tisza <[email protected]>: >>>>>> >>>>>>> On Tue, Dec 20, 2016 at 10:18 AM, Bertel Teilfeldt Hansen < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> And is there no way of getting references through the API? >>>>>>>> >>>>>>> >>>>>>> There is no nice way, but you can always get the HTML (or the parse >>>>>>> tree, depending on whether you want parsed or raw refs) and process it; >>>>>>> references are not hard to extract. For the wikitext version, there is a >>>>>>> python tool: https://github.com/mediawiki-utilities/python-mwrefs >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Mediawiki-api mailing list >>>>>>> [email protected] >>>>>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Mediawiki-api mailing list >>>>>> [email protected] >>>>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Gabriel Wicke >>>>> Principal Engineer, Wikimedia Foundation >>>>> >>>>> _______________________________________________ >>>>> Mediawiki-api mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Mediawiki-api mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >>>> >>>> >>> >>> >>> -- >>> Gabriel Wicke >>> Principal Engineer, Wikimedia Foundation >>> >>> _______________________________________________ >>> Mediawiki-api mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >>> >>> >> >> _______________________________________________ >> Mediawiki-api mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >> >> > > > -- > Gabriel Wicke > Principal Engineer, Wikimedia Foundation > > _______________________________________________ > Mediawiki-api mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > >
_______________________________________________ Mediawiki-api mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
