On Jun 10, 6:21 pm, Ralf Schmitt <[email protected]> wrote: > On Wed, Jun 10, 2009 at 6:55 PM, Nick Johnson<[email protected]> wrote: > > > It looks like my best option is to devise my own db format (such as > > extracting the articles from the XML dump and writing them to a BDB > > database), write a tool to turn a dump into this format, and write a > > WikiDBBase subclass that implements that DB format. I can then iterate > > through all the articles I want to include, parse and render them, and > > output them to my eventual format. Am I correct in this, or is there > > an easier way? > > mw-buildcdb takes such an xml dump and converts it to a cdb format.
Thanks! I saw mw-buildcdb listed, but there were no docs, and I assumed it worked the same as mw-zip. I see that the cdb format is documented as having a limitation of 4GB per CDB database. Will this be a problem processing the Wikipedia dump? If it's not 4GB yet, it certainly will be soon. -Nick Johnson --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mwlib" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/mwlib?hl=en -~----------~----~----~----~------~----~------~--~---
