I'm working with the Wikipedia data dumps to do some data processing.

I know that a page that contains "{{Disambig}}" is considered a
disambiguation page.
But apparently there are many other tags that also can be used for
marking disambiguation pages, such as {{disambig-cleanup}}, {{airport
disambig}}, {{Geodis}} etc.
I found these examples on http://en.wikipedia.org/wiki/Template:Disambig .

Does anyone else here know, what is the full list of these
disambiguation templates?
Or how can I generate a full list of disambiguation pages?
Also, if I work with the international data dumps, they have other
tags (in respective language). So just looking for "{{Disambig}}"
would not work, I would need this tag for each language.
How can I solve this if I want write a script that detects all
disambiguation pages for other languages.


Also, some pages start with a prefix such as "Template:", "User:",
"List_of_", "Wikipedia:", "Image:" etc.
I would like to avoid process pages with these type of prefixes. And I
would like to do it for all languages.
Is there a list with of all these prefixes (both for English and
foreign languages)?
If not I can always write a script that detects what prefixes are
frequently occurring for each language, but I thought there might be a
more formal way of getting a full list of these type of prefixes.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to