There is no TOAST compression on JSON or JSONB data in 9.4 beta 2. I'm not sure about other versions. I'm also not sure if this is a bug or by design, but if it is by design, I think the documentation should be updated.
Here is a summary of my results inserting 10,000 highly compressible JSON docs of 251K each. Column Type - Storage - TOAST table size JSONB - EXTERNAL - 2448 MB JSONB - EXTENDED - 2448 MB JSON - EXTENDED - 2504 MB TEXT - EXTERNAL - 2409 MB TEXT - EXTENDED - 40 MB On Fri, Aug 1, 2014 at 2:36 AM, Larry White <ljw1...@gmail.com> wrote: > > On Fri, Aug 1, 2014 at 2:20 AM, Jeff Janes <jeff.ja...@gmail.com> wrote: > >> On Thursday, July 31, 2014, Larry White <ljw1...@gmail.com> wrote: >> >>> Hi, >>> >>> I'm running an experiment on 9.4 beta 2. >>> >>> I put 275,000 identical JSON files into a table using JSONB (one per >>> row). Each raw text file is 251K in size, so the total uncompressed is >>> 69GB. The column storage is set to EXTENDED. There are other toastable >>> columns in the table, but none have more than 36 bytes of data in them. >>> >>> My Toast table is 66GB. I would have expected to get that much (or more) >>> compression just from JSONB being a binary format. >>> >>> If I compress one of these JSON files outside of Postgres, it goes from >>> 251K to 1K. >>> >> >> That is an astonishing amount of compression. Are you really compressing >> one of the files in isolation, rather than co-compressing many and then >> pro-rating the result? >> > > Yes, I should have explained. These are generated JSON files for testing > and there is massive repetition in them, which is why they compress so well > outside of Postgres. (Basically there is a repeating array of the same > string) I did compress just the one. > > >> >> Can you provide an example of the data, and the command line you used to >> compress it? >> > > Compressed on a Mac with the Compress UI option. Here's a brief sample > from the file. You can see why it compresses so well: > "{\"junk\":[\"124245etweetwet345gwtretwt43 qwrqwq qwre qw > rsdflkas\",\"q4535 wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs > sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535 > wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs > sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535 > wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs > sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535 > wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs > sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535 > wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs > sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535 > > and so on. > > So each file should theoretically fit on a single row in the toast >>> table. In total, the amount well under a GB when compressed outside of PG. >>> >>> Any guesses as to why there is so little compression of this data or how >>> I might remedy the situation? >>> >> >> PostgreSQL's built in tuple compression is generally not very good. It >> is good at compressing long strings of identical bytes, but not good at >> compressing the type of thing you are likely to find in JSON (unless your >> JSON had long strings of spaces to reflect indentation of deeply nested >> structures, which JSON probably wouldn't do and which JSONB certainly >> wouldn't). It was designed to be very fast and to be unencumbered with the >> patent issues common at the time it was written. It was not designed to >> give the best possible compression ratios. >> >> It also compresses each row independently. Most of the compression >> opportunities in a column of JSON data would probably be between rows, when >> the same keys show up and over and over again, not within a row. But it >> can't capture those opportunities. >> > > I'm not expecting miracles with real data, but as far as I can tell, there > is zero compression happening. I'm wondering if it is disabled for JSONB > for some reason. > > >> >> Cheers, >> >> Jeff >> > >