I'm wanting to write something that involves processing the entire en Wikipedia dump, in order to generate a packaged wikipedia in a new format (specifically, .epub, for ebook readers). Currently, it looks like the mwlib tools are oriented around using APIs to directly extract small segments of Wikipedia, not around processing entire dumps.
It looks like my best option is to devise my own db format (such as extracting the articles from the XML dump and writing them to a BDB database), write a tool to turn a dump into this format, and write a WikiDBBase subclass that implements that DB format. I can then iterate through all the articles I want to include, parse and render them, and output them to my eventual format. Am I correct in this, or is there an easier way? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mwlib" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/mwlib?hl=en -~----------~----~----~----~------~----~------~--~---
