[mwlib] Templates from wikipedia

Peter W Sun, 19 Sep 2010 02:30:45 -0700

Hi there,

=== Background ===


I'm doing some research involving parsing many revisions of some
articles in wikipedia across namespaces (i.e. parsing the English
version and the German version and even the be-x-old version.). I have
all of the revisions stored initially in xml dumps from Wikipedia, but
I've already parsed those dumps into a Django database.

=== Actual Question ===
I set up mwlib to get an XHTML version of the raw text of the revision
I send it; I mocked up one of XHTML tests to do this. I don't have
MediaWiki installed, so I sub-classed DictDB and added methods to
"getURL". I also downloaded all of the siteinfos using
mwlib.siteinfo.fetch_siteinfo and made the DictDB "get" the
appropriate one.

All of that done, I can't figure out how to make mwlib aware of
namespace-specific templates: for example, {{too long}}
{{neutrality}}
{{Infobox Military Conflict [...] }}

etc. When I run the parser currently, the xhtml simply deletes all of
those templates.


Is there a way to get mwlib to parse those templates into xhtml?
If so, what do I need to do?

Thanks so much for the help,

Peter

-- 
You received this message because you are subscribed to the Google Groups 
"mwlib" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/mwlib?hl=en.

[mwlib] Templates from wikipedia

Reply via email to