On 5 фев, 20:02, Alex Rades <[email protected]> wrote: > Hi, > I'm trying to do a very basic parsing of wikipedia articles but i > don't find an entry point in mwlib documentation :) The only thing > that i'm finding is the help of the various mw-* commands. Am I > missing something? > > Basically, i need to extract the Infobox (the first table on the right > of article) of a given article, and the first paragraph of the > article. Is it possible to mwlib? > > Thank you very much
If nothing has changed by last 3 months, the only documentation is the source code of mwlib, which can be found at (on linux) /usr/lib/ python*/site-packages/mwlib*/mwlib/, if you installed it with easyintall. See old_uparser.py which is the source code of the various mw-* commands. I think, to extract Infobox, you need parse article using Python: from mwlib import uparser a=uparser.parseString(<title>, wikidb=<database>, raw=<raw>) This function returns tree representation of article, where you can found what you want. <database> is used to substitute templates, <raw> is article sourcecode. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mwlib" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/mwlib?hl=en -~----------~----~----~----~------~----~------~--~---
