On Jun 10, 7:13 pm, Ralf Schmitt <[email protected]> wrote:
> On Wed, Jun 10, 2009 at 8:02 PM, Nick Johnson<[email protected]> wrote:
>
> > I see that the cdb format is documented as having a limitation of 4GB
> > per CDB database. Will this be a problem processing the Wikipedia
> > dump? If it's not 4GB yet, it certainly will be soon.
>
> the cdb file is only used as an index and does not store article data,
> so it should be no problem.

Hm. And more reading reveals that the actual limit in CDB appears to
be per-record rather than per-file.

More alarming is the fact that dumpparser uses a DOM-based parser
(ElementTree). I'm fairly sure that I can't store the DOM of the
entire of Wikipedia in memory.

-Nick Johnson
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mwlib" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/mwlib?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to