On Sat, May 7, 2011 at 4:04 PM, Christian Vetter <[email protected]> wrote:
>> There are about 80k blobs. If 1-byte tags are used for the counts, >> overhead is:9 bytes each: >> >> 2 bytes indexdata tag&length in the BlobHeader >> 3*1 bytes (tags for 3 fields) >> 2*1 bytes (varint count for N==0) >> 1*2 bytes (varint count for N < 2**14) >> >> I assume that few blobs contain more than one entity type. Using >> booleans only saves one byte of overhead compared to this. > > I believe we can get away with 4 bytes: > 2 bytes tag + length > 1 + 1 byte for one field ( bool ) > We omit all fields that equal zero ( they are optional ) and the > reader can then treat that as if it were set to zero I think that this is a bad idea, because then you can't easily distinguish between a count of zero and files written by a program that doesn't set a count. > >>> About 312s to compress all blobs for Germany. Changing the dictionary >>> size does not change much. I lowered it all the way down to 64kb and >>> the values stayed the same essentially. >>> >> >> And deflate? >> > > 185s Thank you. Here's the tradeoffs: Lzma is about twice as slow as deflate to compress and 10% smaller. Decompression should be a little slower than deflate. Is that worth adding a LZMA dependency to any PBF reader? My verdict is no. Protobufs and deflate have extensive language support, LZMA doesn't and may be superseded by XZ. Anyone want to make a compelling case for LZMA? Stefan? Scott _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

