Reported as bug #11109.
On Fri, Aug 1, 2014 at 1:46 PM, Larry White <ljw1...@gmail.com> wrote: > Jeff, > > Thank you for your help. This is a Postgres bug, but I don't think I'd > have figured it out without your help. > > What is happening is that if PG can, after compression, put the entire > 'document' into one row/page in the toast table it does. However, if the > document is too big to fit in one row after compression, it does no > compression at all. This is why it worked for you, but not for me. > > I create my test file (in part) with this loop: > > for (int j = 0; j < 110; j++) { > mediumPayload.getJunk().add("124245etweetwet345gwtretwt43 > qwrqwq qwre qw rsdflkas"); > mediumPayload.getJunk().add("q4535 wqrqwrqwrqw2 > wrqwrqwrq32232w kswe sfasrs sdfsd"); > } > > if the loop runs 110 times as shown, it compresses. > if the loop runs 111 times, it does not: > > With 110 iterations: > Extended 8192 bytes (one page) > External 66 MB > > With 111 iterations: > Extended 69 MB > External 69 MB > > Hopefully they can fix this before the GA release. > > > On Fri, Aug 1, 2014 at 12:38 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: > >> On Thu, Jul 31, 2014 at 11:36 PM, Larry White <ljw1...@gmail.com> wrote: >> >>> >>> On Fri, Aug 1, 2014 at 2:20 AM, Jeff Janes <jeff.ja...@gmail.com> wrote: >>> >>>> On Thursday, July 31, 2014, Larry White <ljw1...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm running an experiment on 9.4 beta 2. >>>>> >>>>> I put 275,000 identical JSON files into a table using JSONB (one per >>>>> row). Each raw text file is 251K in size, so the total uncompressed is >>>>> 69GB. The column storage is set to EXTENDED. There are other toastable >>>>> columns in the table, but none have more than 36 bytes of data in them. >>>>> >>>>> My Toast table is 66GB. I would have expected to get that much (or >>>>> more) compression just from JSONB being a binary format. >>>>> >>>>> If I compress one of these JSON files outside of Postgres, it goes >>>>> from 251K to 1K. >>>>> >>>> >>>> That is an astonishing amount of compression. Are you really >>>> compressing one of the files in isolation, rather than co-compressing many >>>> and then pro-rating the result? >>>> >>> >>> Yes, I should have explained. These are generated JSON files for testing >>> and there is massive repetition in them, which is why they compress so well >>> outside of Postgres. (Basically there is a repeating array of the same >>> string) I did compress just the one. >>> >>> >>>> >>>> Can you provide an example of the data, and the command line you used >>>> to compress it? >>>> >>> >>> Compressed on a Mac with the Compress UI option. Here's a brief sample >>> from the file. You can see why it compresses so well: >>> "{\"junk\":[\"124245etweetwet345gwtretwt43 qwrqwq qwre qw >>> rsdflkas\",\"q4535 wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs >>> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535 >>> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs >>> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535 >>> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs >>> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535 >>> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs >>> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535 >>> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs >>> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535 >>> >>> and so on. >>> >> >> If I take that example (and cap off the array and hash right after the >> end of what you show, and remove the escapes of the double quote marks) >> then it does not compress, but only because it is not long enough to >> trigger the compression attempts. >> >> If I repeat the array portion 4 more times to make the whole thing long >> enough for compression to be used, it compresses nicely. Not 100 fold (but >> then again, neither does bzip2 or gzip on the data I just described), but >> text and json compresses 10 fold and jsonb 5 fold. >> >> Cheers, >> >> Jeff >> > >