On Fri, Aug 22, 2008 at 11:57 PM, Frederik Ramm <[EMAIL PROTECTED]> wrote: > Hi, > > for those of us fighting against the absymal performance of bzip2 > when creating planet files and such, in case you hadn't heard of these: > > There's "pbzip2", readily available in Debian/Ubuntu repositories, which > gives near-linear speedup by utilizing as many CPUs as you have, and if > you are adventurous then even dbzip2 which is able to use all those > spare machines you have sitting around for distributed bzipping: > http://www.mediawiki.org/wiki/Dbzip2 (but I haven't managed to find the > latest source for this and it is flagged "experimental").
svn co http://svn.wikimedia.org/svnroot/mediawiki/trunk/dbzip2 > Further, I have found out that a block size of 200k (-2) actually gives > better compression than the, much slower, default of 900k; examining our > planet files more closely I see that this is obviously a known fact > since they are using 200k block size also. > > I was a bit frustrated with pbzip2 because for my setup I need streaming > operation, and pbzip2 supports writing to stdout but not reading from > stdin. dbzip2 supports reading from STDIN and writing to STDOUT > Good old 7z to the rescue. I don't like 7z, it talks too much and has > the feel of a DOS program, but it *can* do parallel compression *with* > piping for bzip2 files: > > % 7z a dummy -tbzip2 -si -so < foo.osm > foo.osm.bz2 > > Don't ask about the strange command line, I already said I don't like it ;-) > > The bzip2 files created by 7z/pbzip2 are generally a little bit larger > than when using non-parallel bzip2, but fully compatible. _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

