[mwlib] Re: Templates from wikipedia

Peter W Sun, 19 Sep 2010 13:13:02 -0700

Hi Joel,

Thanks for the quick reply; I had been using the latest release, so I
deleted that and installed following the instructions here (http://
code.pediapress.com/wiki/wiki/mwlib-install) to use the git
repository. To make sure it worked, I checked that mwlib/templ/
magics.py had the changes indicated here


http://code.pediapress.com/git/mwlib/?p=mwlib;a=blobdiff;f=mwlib/templ/magics.py;h=b64c00a7d7aa5b243b9177aa4ff8123c446e6b16;hp=ca284d437eb250bbdb6a7e591fdf6cd47b831b67;hb=2e72ccdfd085a3fa69f51c1bc28a767bc25d89f3;hpb=b84fcb106b1dc6f3f55e2c6f9bde6419128e9fba

(which it does).

I re-ran the operation, but the result was exactly the same as
before.

Do I need to instruct the expander to be aware of the templates that
are specific to that namespace (i.e. to be aware of all the templates
defined in each namespace's equivalent of
http://en.wikipedia.org/wiki/Category:Wikipedia_template_categories )?
Is mwlib designed to be able to access that massive list of
templates?

I'm not 100% sure that I'm asking the right question, but the symptoms
are that:

1. [links] become <a>elements</a> (this is good/what I'd expect)
2a. <ref>references</ref> become super/subscripts. (this is good/what
I'd expect)
2b. However, the references section doesn't have any of the
corresponding references (this is bad/not what I'd expect)
3. Wikitext like {{Main article|Article A|Article B|Article C|Article
D}} doesn't have any representation in the generated xhtml (this is
bad/not what I'd expect)

If it'd be helpful, I can post the code + wikiText that I'm testing
against.

Thanks again,

Peter


On Sep 19, 6:00 am, "Joel Nothman" <[email protected]>
wrote:
> Are you using the Git repository HEAD, or the latest release?
>
> Some recent changes fixed some of the "magic templates", including ones  
> that would cause namespace sensitivity.
>
> In particular, I committed the patches after code like this (appearing in  
> Wikipedia's {{asbox}}) didn't work:
>
> {{#switch:{{FULLPAGENAME:{{{name|}}}}} ...
>
> So try get the Git HEAD first...
>
> ~J
>
> On Sun, 19 Sep 2010 12:01:49 +1000, Peter W  
>
> <[email protected]> wrote:
> > Hi there,
>
> > === Background ===
>
> > I'm doing some research involving parsing many revisions of some
> > articles in wikipedia across namespaces (i.e. parsing the English
> > version and the German version and even the be-x-old version.). I have
> > all of the revisions stored initially in xml dumps from Wikipedia, but
> > I've already parsed those dumps into a Django database.
>
> > === Actual Question ===
> > I set up mwlib to get an XHTML version of the raw text of the revision
> > I send it; I mocked up one of XHTML tests to do this. I don't have
> > MediaWiki installed, so I sub-classed DictDB and added methods to
> > "getURL". I also downloaded all of the siteinfos using
> > mwlib.siteinfo.fetch_siteinfo and made the DictDB "get" the
> > appropriate one.
>
> > All of that done, I can't figure out how to make mwlib aware of
> > namespace-specific templates: for example, {{too long}}
> > {{neutrality}}
> > {{Infobox Military Conflict [...] }}
>
> > etc. When I run the parser currently, the xhtml simply deletes all of
> > those templates.
>
> > Is there a way to get mwlib to parse those templates into xhtml?
> > If so, what do I need to do?
>
> > Thanks so much for the help,
>
> > Peter
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mwlib" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/mwlib?hl=en.

[mwlib] Re: Templates from wikipedia

Reply via email to