Hi,
What's the recommended way to get parse-trees in Python for a list of
articles?
I'm having trouble using mw-zip and mwlib.zipwiki. I'm trying to do
mwlib.zipwiki.Wiki('tmp.zip') with a zip created on the commandline
using "mw-zip -m -c :en -o tmp.zip Sun". However, mwlib.zipwiki.Wiki
expects a contents.json file in tmp.zip, which mw-zip hasn't created:
$ mw-zip -c :en -o tmp.zip Sun Moon Stars
creating nuwiki in u'tmpuMGaLk/nuwiki'
2011-05-08T17:13:23 mwlib.utils.info >> fetching 'http://
en.wikipedia.org/w/index.php?title=Help:Books/
License&action=raw&templates=expand'
256/256 80.65 48.39s
>>> import mwlib.zipwiki
>>> w = mwlib.zipwiki.Wiki('tmp.zip')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/dist-packages/mwlib-0.12.14-py2.6-
linux-x86_64.egg/mwlib/zipwiki.py", line 48, in __init__
content = json.loads(unicode(self.zf.read('content.json'),
'utf-8'))
File "/usr/lib/python2.6/zipfile.py", line 834, in read
return self.open(name, "r", pwd).read()
File "/usr/lib/python2.6/zipfile.py", line 857, in open
zinfo = self.getinfo(name)
File "/usr/lib/python2.6/zipfile.py", line 824, in getinfo
'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'content.json' in the archive"
$ ls -la
total 20260
drwxr-xr-x 3 matt matt 4096 2011-05-08 17:16 .
drwxr-xr-x 3 matt matt 4096 2011-05-08 17:16 ..
-rw-r--r-- 1 matt matt 4926482 2011-05-08 17:14 edits.json
-rw-r--r-- 1 matt matt 1276 2011-05-08 17:14 excluded.json
-rw-r--r-- 1 matt matt 53979 2011-05-08 17:14 imageinfo.json
drwxr-xr-x 2 matt matt 4096 2011-05-08 17:16 images
-rw-r--r-- 1 matt matt 25543 2011-05-08 17:14 licenses.json
-rw-r--r-- 1 matt matt 919 2011-05-08 17:13 metabook.json
-rw-r--r-- 1 matt matt 149 2011-05-08 17:14 nfo.json
-rw-r--r-- 1 matt matt 1632927 2011-05-08 17:14 parsed_html.json
-rw-r--r-- 1 matt matt 452 2011-05-08 17:14 redirects.json
-rw-r--r-- 1 matt matt 1492682 2011-05-08 17:14 revisions-1.txt
-rw-r--r-- 1 matt matt 126884 2011-05-08 17:13 siteinfo.json
-rw------- 1 matt matt 12402072 2011-05-08 17:14 tmp.zip
--
You received this message because you are subscribed to the Google Groups
"mwlib" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/mwlib?hl=en.