Hi.

What you are trying to do is probably pretty complicated - the lack of documentation does not make it easier unfortunately.

I hope the following sample code will get you started:

import sys
from mwlib import wiki
from mwlib.parser import show

zip_fn = '/home/volker/test/test.zip'

wiki_obj = wiki.makewiki(zip_fn)

for item in wiki_obj.metabook.walk():
    if item.type == 'article':
parse_tree = item.wiki.getParsedArticle(title=item.title, revision=item.revision)
        show(sys.stdout, parse_tree)

If you want to manipulate the parsetree you should do yourself a favor and transform the parsetree with "buildAdvancedTree" in mwlib.advtree

I hope this helps,
Volker



On 05/08/2011 09:23 AM, Matthew Honnibal wrote:
Hi,
What's the recommended way to get parse-trees in Python for a list of
articles?

I'm having trouble using mw-zip and mwlib.zipwiki. I'm trying to do
mwlib.zipwiki.Wiki('tmp.zip') with a zip created on the commandline
using "mw-zip -m -c :en -o tmp.zip Sun". However, mwlib.zipwiki.Wiki
expects a contents.json file in tmp.zip, which mw-zip hasn't created:

$ mw-zip -c :en -o tmp.zip Sun Moon Stars
creating nuwiki in u'tmpuMGaLk/nuwiki'
2011-05-08T17:13:23 mwlib.utils.info>>  fetching 'http://
en.wikipedia.org/w/index.php?title=Help:Books/
License&action=raw&templates=expand'
256/256 80.65 48.39s

import mwlib.zipwiki
w = mwlib.zipwiki.Wiki('tmp.zip')
Traceback (most recent call last):
   File "<stdin>", line 1, in<module>
   File "/usr/local/lib/python2.6/dist-packages/mwlib-0.12.14-py2.6-
linux-x86_64.egg/mwlib/zipwiki.py", line 48, in __init__
     content = json.loads(unicode(self.zf.read('content.json'),
'utf-8'))
   File "/usr/lib/python2.6/zipfile.py", line 834, in read
     return self.open(name, "r", pwd).read()
   File "/usr/lib/python2.6/zipfile.py", line 857, in open
     zinfo = self.getinfo(name)
   File "/usr/lib/python2.6/zipfile.py", line 824, in getinfo
     'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'content.json' in the archive"

$ ls -la
total 20260
drwxr-xr-x 3 matt matt     4096 2011-05-08 17:16 .
drwxr-xr-x 3 matt matt     4096 2011-05-08 17:16 ..
-rw-r--r-- 1 matt matt  4926482 2011-05-08 17:14 edits.json
-rw-r--r-- 1 matt matt     1276 2011-05-08 17:14 excluded.json
-rw-r--r-- 1 matt matt    53979 2011-05-08 17:14 imageinfo.json
drwxr-xr-x 2 matt matt     4096 2011-05-08 17:16 images
-rw-r--r-- 1 matt matt    25543 2011-05-08 17:14 licenses.json
-rw-r--r-- 1 matt matt      919 2011-05-08 17:13 metabook.json
-rw-r--r-- 1 matt matt      149 2011-05-08 17:14 nfo.json
-rw-r--r-- 1 matt matt  1632927 2011-05-08 17:14 parsed_html.json
-rw-r--r-- 1 matt matt      452 2011-05-08 17:14 redirects.json
-rw-r--r-- 1 matt matt  1492682 2011-05-08 17:14 revisions-1.txt
-rw-r--r-- 1 matt matt   126884 2011-05-08 17:13 siteinfo.json
-rw------- 1 matt matt 12402072 2011-05-08 17:14 tmp.zip


--
volker haas                 brainbot technologies ag
fon +49 6131 2116394        boppstraße 64
fax +49 6131 2116392        55118 mainz
[email protected]    http://www.brainbot.com/

--
You received this message because you are subscribed to the Google Groups 
"mwlib" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/mwlib?hl=en.

Reply via email to