On Jun 10, 6:21 pm, Ralf Schmitt <[email protected]> wrote:
> On Wed, Jun 10, 2009 at 6:55 PM, Nick Johnson<[email protected]> wrote:
>
> > It looks like my best option is to devise my own db format (such as
> > extracting the articles from the XML dump and writing them to a BDB
> > database), write a tool to turn a dump into this format, and write a
> > WikiDBBase subclass that implements that DB format. I can then iterate
> > through all the articles I want to include, parse and render them, and
> > output them to my eventual format. Am I correct in this, or is there
> > an easier way?
>
> mw-buildcdb takes such an xml dump and converts it to a cdb format.

Thanks! I saw mw-buildcdb listed, but there were no docs, and I
assumed it worked the same as mw-zip.

I see that the cdb format is documented as having a limitation of 4GB
per CDB database. Will this be a problem processing the Wikipedia
dump? If it's not 4GB yet, it certainly will be soon.

-Nick Johnson
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mwlib" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/mwlib?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to