Hi Robert, one solution may be to use a query on Wikidata to retrieve the name for the stubs category in all the different languages. Then you could use a tool like PetScan to retrive all the pages in such categories, or write your own tool by using either a query on the database or Mediawiki API. You can find a sample solution here: http://paws-public.wmflabs.org/paws-public/3270/Stub%20categories.ipynb
I wrote that thing while on a train, so it may be messy and/or sub-optimal. I would like to thank Alex Monk and Yuvi Panda for their help with SQL on paws today. Best, Giuseppe 2016-09-20 11:26 GMT+02:00 Robert West <w...@cs.stanford.edu>: > Hi everyone, > > Does anyone know if there's a straightforward (ideally language-independent) > way of identifying stub articles in Wikipedia? > > Whatever works is ok, whether it's publicly available data or data > accessible only on the WMF cluster. > > I've found lists for various languages (e.g., Italian or English), but the > lists are in different formats, so separate code is required for each > language, which doesn't scale. > > I guess in the worst case, I'll have to grep for the respective stub > templates in the respective wikitext dumps, but even this requires to know > for each language what the respective template is. So if anyone could point > me to a list of stub templates in different languages, that would also be > appreciated. > > Thanks! > Bob > > -- > Up for a little language game? -- http://www.unfun.me > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics