It sounds like you might be better off working from a database dump (https://dumps.wikimedia.org/) than from the API.
-- Mark On Wed, Feb 4, 2015 at 4:55 PM, Stefan Kasberger <[email protected]> wrote: > > i look for a dataset with some specific characteristics. revision number is > one, cause articles with low revisions dont create enough metrics for our > algorithm and ones with too much take very long time (network effects). so it > would be helpful to save the time to download lot of xml, compute needed > metrics and select it locally. > > anyway, i would suggest to generete this metadata when a new revision is > created. just one counting variable and way easier to offer afterwards. > > the strong point i want to make is: this is central metadata of the article, > like size, number of characters, date created, urls, page ids, human-readable > titles and computable titles of both, article and talk page. > > another point I have some troubles now is that for example when you output > the page in a query, you use the human readable title as an variable for the > article. the page-id or the computable title (dont know how to call it, the > one used in the url, i.e. Barack_Obama, not Barack Obama) would be better to > use as a key. i. e. got a problem in creating files with the actual variable > (had now the problem with the HIV/AIDS page, when python looked for a folder > HIV where it wanted to create a page AIDS in) and also to address other apis > or services with computer is more direct with it. i use the api for example > to select my data and get it then from the export special page. > > thanks for your answers! > > cheers, stefan > > > On 2015-02-04 23:10, John wrote: > > This type of data is very expensive to generate. If you can provide some > more context of that you are trying to do I might be able to provide some help > > On Wednesday, February 4, 2015, Stefan Kasberger <[email protected]> > wrote: >> >> Hello, >> >> I try to get the number of revisions back for some articles, but I don't >> find any query where this will be offered over the API. only found this >> answer at stackoverflow. >> http://stackoverflow.com/questions/7136343/wikipedia-api-how-to-get-the-number-of-revisions-of-a-page >> >> is this still unsolved? would save me lot of time and I think this is one of >> the most important metadata about an article. I will use it to download just >> articles between 500 and 5000 revisions, cause lower is useless for our >> research and more is too expensive to compute. >> >> thanks for your answer. >> >> cheers, Stefan >> >> -- >> Stefan Kasberger >> E [email protected] >> W www.openscienceASAP.org > > > > _______________________________________________ > Mediawiki-api mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > > > -- > Stefan Kasberger > E [email protected] > W www.openscienceASAP.org > > _______________________________________________ > Mediawiki-api mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > _______________________________________________ Mediawiki-api mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
