On Sun, Jul 2, 2017 at 1:50 PM, Slavomir Skopalik [email protected]
[firebird-support] <[email protected]> wrote:

>
>
> I test general impact of compression and I found that for FB 2.5 and FB3
> is critical to take another DB page from cache.
>
Sure.

OK, you're considering storing the decimal value 13111.  In straight
forward hex, that's 0x33, 0x37.

In little endian format,
       BIGINT with that value is  0x37,0x33,0x0,0x0,0x0,0x0, 0x0, 0x0
       INT with that value is  0x37,0x33,0x0,0x0
       SMALLINT with that value is 0x37,0x33

> BIGINT compressed: 3x ( 2 bytes to compress zeroes + 3 bytes for
> uncompressable data). Total: 15 bytes.
>
 Right.  Each column is stored as 0x2, 0x37,0x33,0xFA 0x0 - five bytes.
The 0x2 means that the next two
bytes should be decompressed as they are.  The 0xFA (-6) means that the
next byte should be repeated
six times.   If the BIGINT were followed by another BIGINT that's null or
has a value of zero, the run-length
compression would require no additional storage.  The -6 would be a -14.

INT compressed: Each column is store as 0x4, 0x37,0x33, 0x0, 0x0 - once
again, five bytes.  The two
trailing zeros are not compressed because the compression is the same size
as the uncompressed data.
Three of those columns in a row would also be 15 bytes.  If the next column
was a null INT or an INT
value 0, the compressed format would be six bytes 0x2, 0x37,0x33, 0xFA, 0x0
for the two columns.

> SMALLINT compressed: 1 byte for uncompressable header + 3x (0x33, 0x37).
> Total: 7 bytes.
>
Right. The stored format for the three columns is 0x6, 0x37, 0x33,0x37,
0x33,0x37, 0x33.  If
the next column is zero or null, you add two more bytes.

 And as long as you're absolutely certain that you'll never store any value
greater than
32767, go for it.   If you're wrong and you need a larger value, you can
just ALTER the
column to a INT, after you figure out what the problem is and get any users
who've read
that column to close their connections.

> The performance gain can vary from few percent to ten times and it is NON
> LINEAR.
>
The effect of compression is data dependent and the effect on performance
is application
dependent.

> Conclusion of my testings:
>
> 1. Size of compressed row has significant impact on performance
>
Absolutely agree.

> 2. SMALLINT will litle help but
>
And has such a limited range, it's a programmer trap. INT and BIGINT are
largely
equivalent.

> 3. VARCHAR and CHARACTER SET has much more impact. Mainly UTF8.
>
VARCHAR adds two bytes for the length of the column.  Its effect on
compression
is less obvious because trailing spaces aren't stored in VARCHAR, and the
places
where you find three identical characters in a string are rare.

For storage, UTF8 does expand characters that aren't ASCII.  And that can
be
a factor, but it's pretty much the cost of multi-lingual applications.

> 4. If performance is not at first place, use INTEGER because upgrade is
> very problematic (you have to drop all PK, FK and indexes)
>
Right, because BIGINT doesn't use the normal numeric key generation.  Which
it could, without
losing any precision, if the low-end digits that are lost in conversion to
double precision were simply
appended to the generated key.  Sigh.

Sincerely,

Ann

Reply via email to