Re: [Firebird-devel] Recore level compresion imroovement

Slavomir Skopalik Fri, 27 Feb 2015 09:21:29 -0800

Hi Jim,
what happens in current Firebird if records not fit in buffer:


1. Scan and calculate commpress length

2. If not fit, than scan control buffer and calculate, how many byteswill fit + padding

3. Compress into small area (scan again)

4. Find another free space on data page and goto 1 with unprocessed partof record.


I'm not sure, that is it faster than compress into buffer on stack, and
made few moves.

Why RLE now, because I have it, and I'm starting with FB sources twoweeks ago.

It was easy to adpot RLE, but it was hard to understand padding.

Now, I woudlike to look into record encoding like you describe, but tobe able to do,

I have to understand, why it is designed as is.

And another point of view,

cost of changes was small and impact on size and speed high -> thats wayI was did it.


You proposal will needs much more works.
From my point of view, isn't realistic to do it into FB2.5x or FB3.

When encoding will be implemented, will be nice to use it also forbackup and wire protocol.


Thank you for.

Slavek


On 27.2.2015 16:40, James Starkey wrote:

The answer to your questions is simple:  It is much faster to encode from
the original record onto the data pages(s), eliminating the need to
allocate, populate, copy, and release a temporary buffer.

And, frankly, the cost of a byte per full database page is not something to
loose sleep over.

The competitive for a different compression scheme isn't the 30 year old
run length encoding but the self-describing, value driven encoding I
described earlier.

Another area where this is much room for improvement is the encoding of
multi-column indexes.  There is a much more clever scheme that doesn't
waste everything fifth byte.

On Friday, February 27, 2015, Slavomir Skopalik <[email protected]>
wrote:

Hi Vlad,
as I see, in some situation (that really happen), packing into small
area is padded by zeroes
(uncomress prefix with zero length).
And new control char added at begining next fragment (you will lost 2
bytes).
The differencies in current compression is not so much, but with better
one is more significant.

Finally, I still not understand, why is better to compress each fragment
separatly, instead
make one compressed block that will split into fragments.

If we have routine to compress/encode full record, we can easyly replace
curent RLE
by any other encoding schemna.

In current situation, is not easy replace corent RLE by value encoding
schema.

I finished new RLE, that is about 25% more efective than my previous post,
but I lossing lot of bytes on padding and new headers (and also 1 byte
per row to keep compatibility with previous DB).

I will clean up code and post here durign few days.

Also record differencies encoding can be improoved, I will do if
somebody will need it.

About update, I'm worry, that fragmented record will not add performace
gain durign update.

Slavek

     Not exactly so. The big record is prepared for compression as a

whole, then

tail of record is packed and put at separate page(s) and finally what

left

(and could be put on single page) is really "re-compressed" separately.

And when record is materialized in RAM all parts are reads and

decompress

separatly.

     What problem do you see here ? How else do you propose to decompress

fragmented

record ?

If comprossor cannot fit in small space, than rest of space is padded
(char 0x0 is in use).

     Record image in memory always have fixed length, according to record

format.

This wastage CPU and disk space.

     CPU - yes, Memory - yes, Disk - no.

     Also, note, it allows later to not waste CPU when fields are

accessed and

record is updated, AFAIU.

Regards,
Vlad

------------------------------------------------------------------------------

Dive into the World of Parallel Programming The Go Parallel Website,

sponsored

by Intel and developed in partnership with Slashdot Media, is your hub

for all

things parallel software development, from weekly thought leadership

blogs to

news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
Firebird-Devel mailing list, web interface at

https://lists.sourceforge.net/lists/listinfo/firebird-devel



------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for
all
things parallel software development, from weekly thought leadership blogs
to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel




------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Re: [Firebird-devel] Recore level compresion imroovement

Reply via email to