Re: [GENERAL] ZSON, PostgreSQL extension for compressing JSONB

2016-10-06 Thread Aleksander Alekseev
Hello, Eduardo.

> Why do you use a dictionary compression and not zlib/lz4/bzip/anyother?

Internally PostgreSQL already has LZ77 family algorithm - PGLZ. I didn't
try to replace it, only to supplement. PGLZ compresses every piece of
data (JSONB documents in this case) independently. What I did is removed
redundant data that exists between documents and that PGLZ can't
compress since every single document usually uses every key and similar
strings (some sort of string tags in arrays, etc) only once.

> Compress/Decompress speed?

By my observations PGLZ has characteristics similar to GZIP. I didn't
benchmark ZSON encoding/decoding separately from DBMS because end
user is interested only in TPS which depends on IO, amount of documents
that we could fit into memory and other factors.

> As I understand, postgresql must decompress before use.

Only if you try to read document fields. For deleting a tuple, doing
vacuum, etc there is no need to decompress a data.

> Some compressing algs (dictionary transforms where a token is word)
> allow search for tokens/words directly on compressed data transforming
> the token/word to search in dictionary entry and searching it in
> compressed data. From it, replace, substring, etc... string
> manipulations algs at word level can be implemented.

Unfortunately I doubt that current ZSON implementation can use these
ideas. However I must agree that it's a very interesting field of
research. I don't think anyone tried to do something like this in
PostgreSQL yet.

> My passion is compression, do you care if I try other algorithms? For
> that, some dict id numbers (>1024 or >1<<16 or <128 for example) say
> which compression algorithm is used or must change zson_header to store
> that information. Doing that, each document could be compressed with
> the best compressor (size or decompression speed) at idle times or at
> request.

By all means! Naturally if you'll find a better encoding I would be happy
to merge corresponding code in ZSON's repository.

> Thanks for sharing and time.

Thanks for feedback and sharing your thoughts!

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [GENERAL] ZSON, PostgreSQL extension for compressing JSONB

2016-10-06 Thread Simon Riggs
On 5 October 2016 at 16:58, Aleksander Alekseev
 wrote:

> What about evolving schema of JSON/JSONB/XML? For instance,
> adding/removing keys in new versions of the application. UPDATE
> COMPRESSION DICTIONARY?

You can add to a dictionary, but not remove things. I'm not sure
that's a big issue.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ZSON, PostgreSQL extension for compressing JSONB

2016-10-05 Thread Aleksander Alekseev
> > I could align ZSON to PostgreSQL code style. I only need to run pgindent
> > and write a few comments. Do you think community would be interested in
> > adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific
> > problem for this?
> 
> CREATE COMPRESSION DICTIONARY public.simple_dict (
> WORDS = my_json_schema
> );
> 
> Then use the dictionary by referencing the DictionaryId within the
> datatype modifier, e.g. ZSON(357)
> 
> That way we can use this for TEXT, XML, JSON etc.. as ZTEXT, ZXML, ZJSON
> 
> So it seems like a generally useful thing to me.

Good idea!

What about evolving schema of JSON/JSONB/XML? For instance,
adding/removing keys in new versions of the application. UPDATE
COMPRESSION DICTIONARY?

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [GENERAL] ZSON, PostgreSQL extension for compressing JSONB

2016-10-05 Thread Simon Riggs
On 4 October 2016 at 16:34, Aleksander Alekseev
 wrote:
> Hello, Simon.
>
> Thanks for you interest to this project!
>
>> Will you be submitting this to core?
>
> I could align ZSON to PostgreSQL code style. I only need to run pgindent
> and write a few comments. Do you think community would be interested in
> adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific
> problem for this?

CREATE COMPRESSION DICTIONARY public.simple_dict (
WORDS = my_json_schema
);

Then use the dictionary by referencing the DictionaryId within the
datatype modifier, e.g. ZSON(357)

That way we can use this for TEXT, XML, JSON etc.. as ZTEXT, ZXML, ZJSON

So it seems like a generally useful thing to me.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ZSON, PostgreSQL extension for compressing JSONB

2016-10-05 Thread Aleksander Alekseev
> > I could align ZSON to PostgreSQL code style. I only need to run pgindent
> > and write a few comments. Do you think community would be interested in
> > adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific
> > problem for this?
> 
> I find the references to pglz quite troubling, particularly by reading
> that this data type visibly uses its own compression logic.

ZSON just replaces frequently used strings to 16-bit codes. It also adds
PGLZ_HINT_SIZE (=32 by default, could be also 0) zero bytes in the
beginning to make it more likely that data will be compressed using PGLZ.
After all, who will use ZSON for small documents? Thats all.

Hope it explains references to PGLZ.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [GENERAL] ZSON, PostgreSQL extension for compressing JSONB

2016-10-05 Thread Aleksander Alekseev
> ~everyone wants lower data storage and wants some kind of compression.
> Can this be made to automatically retrain when analyzing (makes sense?)?
> And create a new dictionary only if it changes compared to the last one.

It's an interesting idea. However I doubt it could be automated in
one-size-fits-all manner. One users would like to do re-learning during
analyzing, others during vacuum, once a month or say using triggers and
some sort of heuristics.

Despite that I see no reason not to accept pull requests with
implementations of different re-learning automation strategies. It's
just not a priority for me personally.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [GENERAL] ZSON, PostgreSQL extension for compressing JSONB

2016-10-05 Thread Vitaly Burovoy
On 10/4/16, Dorian Hoxha  wrote:
> On Tue, Oct 4, 2016 at 5:34 PM, Aleksander Alekseev 
>  wrote:
>> Hello, Simon.
>>
>> Thanks for you interest to this project!
>>
>> > Will you be submitting this to core?
>>
>> I could align ZSON to PostgreSQL code style. I only need to run pgindent
>> and write a few comments. Do you think community would be interested in
>> adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific
>> problem for this?
>
> ~everyone wants lower data storage and wants some kind of compression.
> Can this be made to automatically retrain when analyzing (makes sense?)?
> And create a new dictionary only if it changes compared to the last one.

ANALYZE does not walk through all the table and does not change its records.
Creating a new dictionary for compressing purposes supposes updating
rows of the original table to replace entries to references to a
dictionary.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ZSON, PostgreSQL extension for compressing JSONB

2016-10-04 Thread Michael Paquier
On Wed, Oct 5, 2016 at 12:34 AM, Aleksander Alekseev
 wrote:
> I could align ZSON to PostgreSQL code style. I only need to run pgindent
> and write a few comments. Do you think community would be interested in
> adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific
> problem for this?

I find the references to pglz quite troubling, particularly by reading
that this data type visibly uses its own compression logic.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ZSON, PostgreSQL extension for compressing JSONB

2016-10-04 Thread Dorian Hoxha
@Aleksander
~everyone wants lower data storage and wants some kind of compression.
Can this be made to automatically retrain when analyzing (makes sense?)?
And create a new dictionary only if it changes compared to the last one.

On Tue, Oct 4, 2016 at 5:34 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:

> Hello, Simon.
>
> Thanks for you interest to this project!
>
> > Will you be submitting this to core?
>
> I could align ZSON to PostgreSQL code style. I only need to run pgindent
> and write a few comments. Do you think community would be interested in
> adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific
> problem for this?
>
> --
> Best regards,
> Aleksander Alekseev
>


Re: [GENERAL] ZSON, PostgreSQL extension for compressing JSONB

2016-10-04 Thread Aleksander Alekseev
Hello, Simon.

Thanks for you interest to this project!

> Will you be submitting this to core?

I could align ZSON to PostgreSQL code style. I only need to run pgindent
and write a few comments. Do you think community would be interested in
adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific
problem for this?

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [GENERAL] ZSON, PostgreSQL extension for compressing JSONB

2016-10-04 Thread Oleg Bartunov
On Tue, Oct 4, 2016 at 4:20 PM, Simon Riggs  wrote:

> On 30 September 2016 at 16:58, Aleksander Alekseev
>  wrote:
>
> > I've just uploaded ZSON extension on GitHub:
> >
> > https://github.com/afiskon/zson
> >
> > ZSON learns on your common JSONB documents and creates a dictionary
> > with strings that are frequently used in all documents. After that you
> > can use ZSON type to compress documents using this dictionary. When
> > documents schema evolve and compression becomes inefficient you can
> > re-learn on new documents. New documents will be compressed with a new
> > dictionary, old documents will be decompressed using old dictionary.
> >
> > In some cases ZSON can save half of your disk space and give you about
> > 10% more TPS. Everything depends on your data and workload though.
> > Memory is saved as well. For more details see README.md.
> >
> > Please don't hesitate to ask any questions. Any feedback and pull
> > requests are welcome too!
>
> Very good. Oleg had mentioned that dictionary compression was being
> considered.
>
> It would be useful to be able to define compression dictionaries for
> many use cases.
>


Dictionary compression is a different project, we'll publish it after
testing.


>
> Will you be submitting this to core?
>
> --
> Simon Riggshttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] ZSON, PostgreSQL extension for compressing JSONB

2016-10-04 Thread Simon Riggs
On 30 September 2016 at 16:58, Aleksander Alekseev
 wrote:

> I've just uploaded ZSON extension on GitHub:
>
> https://github.com/afiskon/zson
>
> ZSON learns on your common JSONB documents and creates a dictionary
> with strings that are frequently used in all documents. After that you
> can use ZSON type to compress documents using this dictionary. When
> documents schema evolve and compression becomes inefficient you can
> re-learn on new documents. New documents will be compressed with a new
> dictionary, old documents will be decompressed using old dictionary.
>
> In some cases ZSON can save half of your disk space and give you about
> 10% more TPS. Everything depends on your data and workload though.
> Memory is saved as well. For more details see README.md.
>
> Please don't hesitate to ask any questions. Any feedback and pull
> requests are welcome too!

Very good. Oleg had mentioned that dictionary compression was being considered.

It would be useful to be able to define compression dictionaries for
many use cases.

Will you be submitting this to core?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ZSON, PostgreSQL extension for compressing JSONB

2016-10-03 Thread Alan Gano
I like this, seeing that the keys of JSON docs are replicated in every
record.

I makes my old-school DBA-Sense start to itch.



On Fri, Sep 30, 2016 at 8:58 AM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:

> Hello.
>
> I've just uploaded ZSON extension on GitHub:
>
> https://github.com/afiskon/zson
>
> ZSON learns on your common JSONB documents and creates a dictionary
> with strings that are frequently used in all documents. After that you
> can use ZSON type to compress documents using this dictionary. When
> documents schema evolve and compression becomes inefficient you can
> re-learn on new documents. New documents will be compressed with a new
> dictionary, old documents will be decompressed using old dictionary.
>
> In some cases ZSON can save half of your disk space and give you about
> 10% more TPS. Everything depends on your data and workload though.
> Memory is saved as well. For more details see README.md.
>
> Please don't hesitate to ask any questions. Any feedback and pull
> requests are welcome too!
>
> --
> Best regards,
> Aleksander Alekseev
>


[GENERAL] ZSON, PostgreSQL extension for compressing JSONB

2016-09-30 Thread Aleksander Alekseev
Hello.

I've just uploaded ZSON extension on GitHub:

https://github.com/afiskon/zson

ZSON learns on your common JSONB documents and creates a dictionary
with strings that are frequently used in all documents. After that you
can use ZSON type to compress documents using this dictionary. When
documents schema evolve and compression becomes inefficient you can
re-learn on new documents. New documents will be compressed with a new
dictionary, old documents will be decompressed using old dictionary.

In some cases ZSON can save half of your disk space and give you about
10% more TPS. Everything depends on your data and workload though.
Memory is saved as well. For more details see README.md.

Please don't hesitate to ask any questions. Any feedback and pull
requests are welcome too!

-- 
Best regards,
Aleksander Alekseev


pgpOLxOzqfQCg.pgp
Description: OpenPGP digital signature