Re: [HACKERS] jsonb format is pessimal for toast compression

Jan Wieck Thu, 04 Sep 2014 19:27:46 -0700

On 08/08/2014 11:18 AM, Tom Lane wrote:

Andrew Dunstan <and...@dunslane.net> writes:

On 08/07/2014 11:17 PM, Tom Lane wrote:

I looked into the issue reported in bug #11109.  The problem appears to be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON array or object will be fairly incompressible, because
it consists mostly of a strictly-increasing series of integer offsets.

Ouch.

Back when this structure was first presented at pgCon 2013, I wondered
if we shouldn't extract the strings into a dictionary, because of key
repetition, and convinced myself that this shouldn't be necessary
because in significant cases TOAST would take care of it.


That's not really the issue here, I think.  The problem is that a
relatively minor aspect of the representation, namely the choice to store
a series of offsets rather than a series of lengths, produces
nonrepetitive data even when the original input is repetitive.

This is only because the input data was exact copies of the same stringsover and over again. PGLZ can very well compress slightly less identicalstrings of varying lengths too. Not as well, but well enough. But Isuspect such input data would make it fail again, even with lengths.



Regards,
Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

Reply via email to