On Tue, Feb 24, 2009 at 9:56 AM, Brian <[email protected]> wrote: > Its not at all clear why the english wikipedia dump or other large > dumps need to be compressed. It is far more absurd to spend hundreds > of days compressing a file than it is to spend tens of days > downloading one.
Faulty premise. Based on my old-ish hardware and the smaller but still very large ruwiki dump, I'd assume the actual compression of enwiki would take less than a week of processing time. Since my high end DSL would take multiple weeks to download ~2 TBs uncompressed, it is clearly a net time savings to compress it first. Compression does take substantial time, but my impression is that the hundreds of days comes mostly from communicating with the data store and assembling the XML, and not from compressing the output. -Robert Rohde _______________________________________________ foundation-l mailing list [email protected] Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
