Ok, so if I wanted to get the entire revision history for a page, it would
actually be cheaper to run parallel requests on the REST api than to do
multi-revision requests on the action api? I can get up to 50 revisions per
request on the action api and I kinda assumed that would be cheaper than
getting only one for each request.

Thanks for the cc! Looking forward to hearing from them!

Best,

Bertel

2017-01-05 17:36 GMT+01:00 Gabriel Wicke <[email protected]>:

> On Thu, Jan 5, 2017 at 6:20 AM, Bertel Teilfeldt Hansen <
> [email protected]> wrote:
>
>> Hi Gabriel,
>>
>> Oh yeah, I see now that the REST api doesn't mind parallel requests. I
>> was going off of the etiquette section in the documentation for the other
>> api (https://www.mediawiki.org/wiki/API:Etiquette). That one prefers
>> requests in series.
>>
>> Ah, ok - that caveat is actually quite relevant for my project. It
>> requires all revisions of certain pages along with all revisions of their
>> talk pages (along with a bunch of other stuff). So perhaps the REST api is
>> not for me. I am not targeting especially frequently edited articles
>> specifically; rather, I'm look at articles related to particular real-world
>> conflicts (international and civil wars). I am a postdoc at Copenhagen
>> University funded by the Danish government (grant information at the bottom
>> of this page: http://ufm.dk/en/research-and-
>> innovation/funding-programmes-for-research-and-innovation/wh
>> o-has-received-funding/2015/postdoc-grants-from-the-danish-
>> council-for-independent-research-social-sciences-
>> february-2015?set_language=en&cl=en). Let me know if you want more
>> identification or anything.
>>
>
>
> My concern was mainly about the overall volume of uncached requests. It
> sounds like you are interested in is a fairly small subset of overall
> pages, so I think this should be fine. Perhaps don't max out the
> parallelism in this case. In any case, making the same requests to the
> action API will result in even more on-demand parses, as only the very
> latest revision is cached in that case.
>
>
>>
>> I actually have another question about the REST api, if that's ok. I'm
>> using it to get page views over time for the pages that I'm interested in.
>> However, the data don't seem to stretch very far back in time - is that
>> correct? And if so, is there a better way of getting page views (short of
>> using the raw files at https://dumps.wikimedia.org/other/pagecounts-raw/
>> )?
>>
>
> Yes, the pageview API is relatively new, and only has recent data at this
> point. I am not certain if the analytics team plans to back-fill more
> historic data over time. I vaguely remember that there might be
> difficulties with changes in what is considered a pageview, so the numbers
> might not be completely comparable. I cc'ed Nuria and Dan from the
> analytics team, who should be able to speak to this.
>
>
>
>
>> Thanks for your help so far!
>>
>> Bertel
>>
>>
>>
>>
>> 2017-01-03 19:25 GMT+01:00 Gabriel Wicke <[email protected]>:
>>
>>> Bertel,
>>>
>>> On Mon, Jan 2, 2017 at 7:40 AM, Bertel Teilfeldt Hansen <
>>> [email protected]> wrote:
>>>
>>>> Hi Gabriel,
>>>>
>>>> The REST API looks promising - thank you!
>>>>
>>>> Having played around with it a bit, I seem to only be able to get one
>>>> revision per request. Is that correct, or am I doing something wrong?
>>>>
>>>
>>>
>>> this is correct. The requests themselves are quite cheap, and can be
>>> parallelized up to rate limit set out in the API documentation.
>>>
>>>
>>>
>>>> My project requires every revision and its references from a large
>>>> number of articles, so that would make a lot of requests. The regular API
>>>> allows for multiple revisions per request (only with action=query, though).
>>>>
>>>
>>>
>>> There is a caveat here in that we currently don't store all revisions
>>> for all articles. This means that requests for really old revisions will
>>> trigger a more expensive on-demand parse, just as with the action API. Can
>>> you say more about the number of articles you are targeting, and how this
>>> list is selected? Regarding the selection, I am mainly wondering if you are
>>> targeting especially frequently edited articles.
>>>
>>> Thanks,
>>>
>>> Gabriel
>>>
>>>
>>>>
>>>> Thanks!
>>>>
>>>> Bertel
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2016-12-21 17:01 GMT+01:00 Gabriel Wicke <[email protected]>:
>>>>
>>>>> Bertel, another option is to use the REST API:
>>>>>
>>>>>
>>>>>    - HTML for a specific revision: https://en.wikipedia
>>>>>    .org/api/rest_v1/#!/Page_content/getFormatRevision
>>>>>    
>>>>> <https://en.wikipedia.org/api/rest_v1/#!/Page_content/getFormatRevision>
>>>>>    - Within this HTML, references are marked up like this:
>>>>>    https://www.mediawiki.org/wiki/Specs/HTML/1.3.0/Extensions/Cite
>>>>>    <https://www.mediawiki.org/wiki/Specs/HTML/1.3.0/Extensions/Cite>.
>>>>>    Any HTML or XML DOM parser can be used to extract this information.
>>>>>
>>>>> Hope this helps,
>>>>>
>>>>> Gabriel
>>>>>
>>>>> On Wed, Dec 21, 2016 at 3:20 AM, Bertel Teilfeldt Hansen <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Brad and Gergo,
>>>>>>
>>>>>> Thanks for your responses!
>>>>>>
>>>>>> @Brad: Yeah, that was also my impression, but I wasn't sure. Seemed
>>>>>> strange that the example in the official docs would point to a place 
>>>>>> where
>>>>>> the feature was disabled. Thank you for clearing that up!
>>>>>>
>>>>>> @Gergo: I've been looking at action=parse, but as far as I understand
>>>>>> it, it is limited to one revision per API request, which makes it quite
>>>>>> slow to get a bunch of older revisions from a large number of articles.
>>>>>> action=query&prop=revisions&rvprop=content omits the references from
>>>>>> the output (just gives the string "{{reflist}}" after "References").
>>>>>> "mvrefs" sounds very promising, though! I will definitely check that out 
>>>>>> -
>>>>>> thank you!
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Bertel
>>>>>>
>>>>>> 2016-12-20 19:51 GMT+01:00 Gergo Tisza <[email protected]>:
>>>>>>
>>>>>>> On Tue, Dec 20, 2016 at 10:18 AM, Bertel Teilfeldt Hansen <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> And is there no way of getting references through the API?
>>>>>>>>
>>>>>>>
>>>>>>> There is no nice way, but you can always get the HTML (or the parse
>>>>>>> tree, depending on whether you want parsed or raw refs) and process it;
>>>>>>> references are not hard to extract. For the wikitext version, there is a
>>>>>>> python tool: https://github.com/mediawiki-utilities/python-mwrefs
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mediawiki-api mailing list
>>>>>>> [email protected]
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Mediawiki-api mailing list
>>>>>> [email protected]
>>>>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Gabriel Wicke
>>>>> Principal Engineer, Wikimedia Foundation
>>>>>
>>>>> _______________________________________________
>>>>> Mediawiki-api mailing list
>>>>> [email protected]
>>>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Mediawiki-api mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>>>
>>>>
>>>
>>>
>>> --
>>> Gabriel Wicke
>>> Principal Engineer, Wikimedia Foundation
>>>
>>> _______________________________________________
>>> Mediawiki-api mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>>
>>>
>>
>> _______________________________________________
>> Mediawiki-api mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>
>>
>
>
> --
> Gabriel Wicke
> Principal Engineer, Wikimedia Foundation
>
> _______________________________________________
> Mediawiki-api mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>
>
_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Reply via email to