There is no TOAST compression on JSON or JSONB data in 9.4 beta 2. I'm not
sure about other versions.  I'm also not sure if this is a bug or by
design, but if it is by design, I think the documentation should be
updated.

Here is a summary of my results inserting 10,000 highly compressible JSON
docs of 251K each.

Column Type  - Storage       - TOAST table size
JSONB           - EXTERNAL -  2448 MB
JSONB           - EXTENDED - 2448 MB
JSON             - EXTENDED - 2504 MB
TEXT             - EXTERNAL  - 2409 MB
TEXT             - EXTENDED -      40 MB


On Fri, Aug 1, 2014 at 2:36 AM, Larry White <ljw1...@gmail.com> wrote:

>
> On Fri, Aug 1, 2014 at 2:20 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:
>
>> On Thursday, July 31, 2014, Larry White <ljw1...@gmail.com> wrote:
>>
>>>  Hi,
>>>
>>> I'm running an experiment on 9.4 beta 2.
>>>
>>> I put 275,000 identical JSON files into a table using JSONB (one per
>>> row).  Each raw text file is 251K in size, so the total uncompressed is
>>> 69GB. The column storage is set to EXTENDED.  There are other toastable
>>> columns in the table, but none have more than 36 bytes of data in them.
>>>
>>> My Toast table is 66GB. I would have expected to get that much (or more)
>>> compression just from JSONB being a binary format.
>>>
>>> If I compress one of these JSON files outside of Postgres, it goes from
>>> 251K to 1K.
>>>
>>
>> That is an astonishing amount of compression.  Are you really compressing
>> one of the files in isolation, rather than co-compressing many and then
>> pro-rating the result?
>>
>
> Yes, I should have explained. These are generated JSON files for testing
> and there is massive repetition in them, which is why they compress so well
> outside of Postgres. (Basically there is a repeating array of the same
> string) I did compress just the one.
>
>
>>
>> Can you provide an example of the data, and the command line you used to
>> compress it?
>>
>
> Compressed on a Mac with the Compress UI option. Here's a brief sample
> from the file. You can see why it compresses so well:
> "{\"junk\":[\"124245etweetwet345gwtretwt43 qwrqwq qwre qw
> rsdflkas\",\"q4535 wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
>
> and so on.
>
>  So each file should theoretically fit on a single row in the toast
>>> table. In total, the amount well under a GB when compressed outside of PG.
>>>
>>> Any guesses as to why there is so little compression of this data or how
>>> I might remedy the situation?
>>>
>>
>> PostgreSQL's built in tuple compression is generally not very good.  It
>> is good at compressing long strings of identical bytes, but not good at
>> compressing the type of thing you are likely to find in JSON (unless your
>> JSON had long strings of spaces to reflect indentation of deeply nested
>> structures, which JSON probably wouldn't do and which JSONB certainly
>> wouldn't).  It was designed to be very fast and to be unencumbered with the
>> patent issues common at the time it was written.  It was not designed to
>> give the best possible compression ratios.
>>
>> It also compresses each row independently.  Most of the compression
>> opportunities in a column of JSON data would probably be between rows, when
>> the same keys show up and over and over again, not within a row.  But it
>> can't capture those opportunities.
>>
>
> I'm not expecting miracles with real data, but as far as I can tell, there
> is zero compression happening. I'm wondering if it is disabled for JSONB
> for some reason.
>
>
>>
>> Cheers,
>>
>> Jeff
>>
>
>

Reply via email to