Reported as bug #11109.

On Fri, Aug 1, 2014 at 1:46 PM, Larry White <ljw1...@gmail.com> wrote:

> Jeff,
>
> Thank you for your help. This is a Postgres bug, but I don't think I'd
> have figured it out without your help.
>
> What is happening is that if PG can, after compression, put the entire
> 'document' into one row/page in the toast table it does. However, if the
> document is too big to fit in one row after compression, it does no
> compression at all.  This is why it worked for you, but not for me.
>
> I create my test file (in part) with this loop:
>
>         for (int j = 0; j < 110; j++) {
>             mediumPayload.getJunk().add("124245etweetwet345gwtretwt43
> qwrqwq qwre qw rsdflkas");
>             mediumPayload.getJunk().add("q4535 wqrqwrqwrqw2
> wrqwrqwrq32232w kswe sfasrs sdfsd");
>         }
>
> if the loop runs 110 times as shown, it compresses.
> if the loop runs 111 times, it does not:
>
> With 110 iterations:
> Extended      8192 bytes  (one page)
> External           66 MB
>
> With 111 iterations:
> Extended      69 MB
> External        69 MB
>
> Hopefully they can fix this before the GA release.
>
>
> On Fri, Aug 1, 2014 at 12:38 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
>
>> On Thu, Jul 31, 2014 at 11:36 PM, Larry White <ljw1...@gmail.com> wrote:
>>
>>>
>>> On Fri, Aug 1, 2014 at 2:20 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:
>>>
>>>> On Thursday, July 31, 2014, Larry White <ljw1...@gmail.com> wrote:
>>>>
>>>>>  Hi,
>>>>>
>>>>> I'm running an experiment on 9.4 beta 2.
>>>>>
>>>>> I put 275,000 identical JSON files into a table using JSONB (one per
>>>>> row).  Each raw text file is 251K in size, so the total uncompressed is
>>>>> 69GB. The column storage is set to EXTENDED.  There are other toastable
>>>>> columns in the table, but none have more than 36 bytes of data in them.
>>>>>
>>>>> My Toast table is 66GB. I would have expected to get that much (or
>>>>> more) compression just from JSONB being a binary format.
>>>>>
>>>>> If I compress one of these JSON files outside of Postgres, it goes
>>>>> from 251K to 1K.
>>>>>
>>>>
>>>> That is an astonishing amount of compression.  Are you really
>>>> compressing one of the files in isolation, rather than co-compressing many
>>>> and then pro-rating the result?
>>>>
>>>
>>> Yes, I should have explained. These are generated JSON files for testing
>>> and there is massive repetition in them, which is why they compress so well
>>> outside of Postgres. (Basically there is a repeating array of the same
>>> string) I did compress just the one.
>>>
>>>
>>>>
>>>> Can you provide an example of the data, and the command line you used
>>>> to compress it?
>>>>
>>>
>>> Compressed on a Mac with the Compress UI option. Here's a brief sample
>>> from the file. You can see why it compresses so well:
>>> "{\"junk\":[\"124245etweetwet345gwtretwt43 qwrqwq qwre qw
>>> rsdflkas\",\"q4535 wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
>>> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
>>> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
>>> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
>>> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
>>> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
>>> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
>>> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
>>> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
>>> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
>>>
>>> and so on.
>>>
>>
>> If I take that example (and cap off the array and hash right after the
>> end of what you show, and remove the escapes of the double quote marks)
>> then it does not compress, but only because it is not long enough to
>> trigger the compression attempts.
>>
>> If I repeat the array portion 4 more times to make the whole thing long
>> enough for compression to be used, it compresses nicely.  Not 100 fold (but
>> then again, neither does bzip2 or gzip on the data I just described), but
>> text and json compresses 10 fold and jsonb 5 fold.
>>
>> Cheers,
>>
>> Jeff
>>
>
>

Reply via email to