Hi Volker,

Thanks for that, I think I see what to do.

> the lack of documentation does not make it easier unfortunately.

Yes it's a pity, as mwlib is a nice project. I'll try to be a good citizen
and write up a tutorial when I'm done with this, as I think what I want to
do might be a common use-case.

On Fri, May 20, 2011 at 8:31 PM, Volker Haas <[email protected]>wrote:

> Hi.
>
> What you are trying to do is probably pretty complicated - the lack of
> documentation does not make it easier unfortunately.
>
> I hope the following sample code will get you started:
>
> import sys
> from mwlib import wiki
> from mwlib.parser import show
>
> zip_fn = '/home/volker/test/test.zip'
>
> wiki_obj = wiki.makewiki(zip_fn)
>
> for item in wiki_obj.metabook.walk():
>    if item.type == 'article':
>        parse_tree = item.wiki.getParsedArticle(title=item.title,
> revision=item.revision)
>        show(sys.stdout, parse_tree)
>
> If you want to manipulate the parsetree you should do yourself a favor and
> transform the parsetree with "buildAdvancedTree" in mwlib.advtree
>
> I hope this helps,
> Volker
>
>
>
>
> On 05/08/2011 09:23 AM, Matthew Honnibal wrote:
>
>> Hi,
>> What's the recommended way to get parse-trees in Python for a list of
>> articles?
>>
>> I'm having trouble using mw-zip and mwlib.zipwiki. I'm trying to do
>> mwlib.zipwiki.Wiki('tmp.zip') with a zip created on the commandline
>> using "mw-zip -m -c :en -o tmp.zip Sun". However, mwlib.zipwiki.Wiki
>> expects a contents.json file in tmp.zip, which mw-zip hasn't created:
>>
>> $ mw-zip -c :en -o tmp.zip Sun Moon Stars
>> creating nuwiki in u'tmpuMGaLk/nuwiki'
>> 2011-05-08T17:13:23 mwlib.utils.info>>  fetching 'http://
>> en.wikipedia.org/w/index.php?title=Help:Books/
>> License&action=raw&templates=expand'
>> 256/256 80.65 48.39s
>>
>>  import mwlib.zipwiki
>>>>> w = mwlib.zipwiki.Wiki('tmp.zip')
>>>>>
>>>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in<module>
>>   File "/usr/local/lib/python2.6/dist-packages/mwlib-0.12.14-py2.6-
>> linux-x86_64.egg/mwlib/zipwiki.py", line 48, in __init__
>>     content = json.loads(unicode(self.zf.read('content.json'),
>> 'utf-8'))
>>   File "/usr/lib/python2.6/zipfile.py", line 834, in read
>>     return self.open(name, "r", pwd).read()
>>   File "/usr/lib/python2.6/zipfile.py", line 857, in open
>>     zinfo = self.getinfo(name)
>>   File "/usr/lib/python2.6/zipfile.py", line 824, in getinfo
>>     'There is no item named %r in the archive' % name)
>> KeyError: "There is no item named 'content.json' in the archive"
>>
>> $ ls -la
>> total 20260
>> drwxr-xr-x 3 matt matt     4096 2011-05-08 17:16 .
>> drwxr-xr-x 3 matt matt     4096 2011-05-08 17:16 ..
>> -rw-r--r-- 1 matt matt  4926482 2011-05-08 17:14 edits.json
>> -rw-r--r-- 1 matt matt     1276 2011-05-08 17:14 excluded.json
>> -rw-r--r-- 1 matt matt    53979 2011-05-08 17:14 imageinfo.json
>> drwxr-xr-x 2 matt matt     4096 2011-05-08 17:16 images
>> -rw-r--r-- 1 matt matt    25543 2011-05-08 17:14 licenses.json
>> -rw-r--r-- 1 matt matt      919 2011-05-08 17:13 metabook.json
>> -rw-r--r-- 1 matt matt      149 2011-05-08 17:14 nfo.json
>> -rw-r--r-- 1 matt matt  1632927 2011-05-08 17:14 parsed_html.json
>> -rw-r--r-- 1 matt matt      452 2011-05-08 17:14 redirects.json
>> -rw-r--r-- 1 matt matt  1492682 2011-05-08 17:14 revisions-1.txt
>> -rw-r--r-- 1 matt matt   126884 2011-05-08 17:13 siteinfo.json
>> -rw------- 1 matt matt 12402072 2011-05-08 17:14 tmp.zip
>>
>>
> --
> volker haas                 brainbot technologies ag
> fon +49 6131 2116394        boppstraße 64
> fax +49 6131 2116392        55118 mainz
> [email protected]    http://www.brainbot.com/
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "mwlib" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/mwlib?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mwlib" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/mwlib?hl=en.

Reply via email to