It sounds like you might be better off working from a database dump
(https://dumps.wikimedia.org/) than from the API.

-- 
Mark

On Wed, Feb 4, 2015 at 4:55 PM, Stefan Kasberger
<[email protected]> wrote:
>
> i look for a dataset with some specific characteristics. revision number is 
> one, cause articles with low revisions dont create enough metrics for our 
> algorithm and ones with too much take very long time (network effects). so it 
> would be helpful to save the time to download lot of xml, compute needed 
> metrics and select it locally.
>
> anyway, i would suggest to generete this metadata when a new revision is 
> created. just one counting variable and way easier to offer afterwards.
>
> the strong point i want to make is: this is central metadata of the article, 
> like size, number of characters, date created, urls, page ids, human-readable 
> titles and computable titles of both, article and talk page.
>
> another point I have some troubles now is that for example when you output 
> the page in a query, you use the human readable title as an variable for the 
> article. the page-id or the computable title (dont know how to call it, the 
> one used in the url, i.e. Barack_Obama, not Barack Obama) would be better to 
> use as a key. i. e. got a problem in creating files with the actual variable 
> (had now the problem with the HIV/AIDS page, when python looked for a folder 
> HIV where it wanted to create a page AIDS in) and also to address other apis 
> or services with computer is more direct with it. i use the api for example 
> to select my data and get it then from the export special page.
>
> thanks for your answers!
>
> cheers, stefan
>
>
> On 2015-02-04 23:10, John wrote:
>
> This type of data is very expensive to generate.  If you can provide some 
> more context of that you are trying to do I might be able to provide some help
>
> On Wednesday, February 4, 2015, Stefan Kasberger <[email protected]> 
> wrote:
>>
>> Hello,
>>
>> I try to get the number of revisions back for some articles, but I don't 
>> find any query where this will be offered over the API. only found this 
>> answer at stackoverflow.
>> http://stackoverflow.com/questions/7136343/wikipedia-api-how-to-get-the-number-of-revisions-of-a-page
>>
>> is this still unsolved? would save me lot of time and I think this is one of 
>> the most important metadata about an article. I will use it to download just 
>> articles between 500 and 5000 revisions, cause lower is useless for our 
>> research and more is too expensive to compute.
>>
>> thanks for your answer.
>>
>> cheers, Stefan
>>
>> --
>> Stefan Kasberger
>> E [email protected]
>> W www.openscienceASAP.org
>
>
>
> _______________________________________________
> Mediawiki-api mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>
>
> --
> Stefan Kasberger
> E [email protected]
> W www.openscienceASAP.org
>
> _______________________________________________
> Mediawiki-api mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>

_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Reply via email to