Also, I wonder if these folks have been consulted for their expertise in compressing wikipedia data: http://prize.hutter1.net/
On Wed, Dec 24, 2008 at 5:09 PM, Brian <[email protected]> wrote: > Interesting. I realize that the dump is extremely large, but if 7zip is > really the bottleneck then to me the solutions are straightforward: > > 1. Offer an uncompressed version of the dump for download. Bandwidth is > cheap and downloads can be resumed, unlike this dump process > 2. The WMF offers a service whereby the mail the uncompressed dump to you > on a hard drive. You pay for the drive and a service charge. > > Cheers, > > > > On Wed, Dec 24, 2008 at 5:03 PM, Erik Zachte <[email protected]>wrote: > >> Hi Brian, Brion once explained to me that the post processing of the dump >> is >> the main bottleneck. >> >> Compressing articles with tens of thousands of revisions is a major >> resource >> drain. >> Right now every dump is even compressed twice, into bzip2 (for wider >> platform compatibility) and 7zip format (for 20 times smaller downloads). >> This may no longer be needed as 7zip presumably gained better support on >> major platforms over the years. >> Apart from that the job could gain from parallelization and better error >> recovery. >> >> Erik Zachte >> >> ________________________________________ >> >> I am still quite shocked at the amount of time the english wikipedia takes >> to dump, especially since we seem to have close links to folks who work at >> mysql. To me it seems that one of two things must be the case: >> >> 1. Wikipedia has outgrown mysql, in the sense that, while we can put data >> in, we cannot get it all back out. >> 2. Despite aggressive hardware purchases over the years, the correct >> hardware has still not been purchased. >> >> I wonder which of these is the case. Presumably #2 ? >> >> Cheers, >> Brian >> >> >> >> >> _______________________________________________ >> foundation-l mailing list >> [email protected] >> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l >> > > > > -- > (Not sent from my iPhone) > -- (Not sent from my iPhone) _______________________________________________ foundation-l mailing list [email protected] Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
