Hi all,

I'm trying to scrape some data from en.wiki about the outlinks from the
body of articles. However, the API returns article outlinks contained
within templates. While I can write a routine to get a list of all the
templates and identify the article links inside these templates to remove
from the outlinks, this is problematic if a link appears in both the body
and a template. Thus if article X has a link to Y in the body as well as
links to Y an Z in templates, I want to capture Y but not Y & Z.

Ideally, I'd like to either (1) be able to count the number of times an
article links out to another article (if X links to Y twice) and then
iterate this count down for each appearance in a template or (2) count only
the links occurring in the body and not parsing the links in templates.

Thank you in advance for your suggestions!

Best,

Brian
_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Reply via email to