Nazarius Kappertaal <[email protected]> writes:
>
> I mostly just need to see how to parse and extract data from the tree
> that is created when one parses a mediawiki article - links/images/
> tables -> currently doing it with a variety of regexes (multi-pass is
> bad).
>
> A concrete tutorial/walkthrough would be akin to dive into python,
> centred around some type of complex, featured article -> en.wikipedia -
>> DNA/World War II.
>
> And just run down the most used aspects of mwlib, using the featured
> article.
>
> Nothing long, just some code that works, and shows (via comments/
> descriptions) how one can use mwlib in real life. Make it easier for
> outsiders.
>
the following is a basic example. the comments somehow got lost. sorry
:)
#! /usr/bin/env python
# mw-zip -x -c :en -o acdc.zip AC/DC
from mwlib import wiki
from mwlib.refine import core
env = wiki.makewiki("acdc.zip")
a=env.wiki.getParsedArticle("AC/DC")
sections = core.walknodel([a], lambda x: x.tagname=="@section" and x.level==2)
for s in sections:
core.show(s.children[:1])
print "------------"
for k in core.walknodel([a], lambda x: x.type==core.T.t_complex_link):
core.show(k)
- Ralf
--
You received this message because you are subscribed to the Google Groups
"mwlib" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/mwlib?hl=en.