Re: [Firebird-devel] INT64 and index keys

Vlad Khorsun Tue, 15 Feb 2022 07:21:16 -0800

15.02.2022 13:28, Dmitry Yemanov wrote:

All,
Historically, we store most numerics as double precision inside index keys. It makes stored keys independent of the declared scaleand allows both prefix and suffix compression. However, int64 keys (BIGINT and largish NUMERICs/DECIMALs) do not fit the doubleformat without risk of rounding/truncation, so they're stored differently -- normalized to the maximum value that fits into doubleand then re-scaled. The problem is that this format is not well-compressable.

...

Maybe we couldn't do any better before FB4, but now we have INT128 and DECFLOAT which use more effective index key format, so Iasked myself whether it makes sense to utilize it for int64 values too.
Below is the stats if BIGINT would be stored as idx_decimal in the index:

T_BIGINT2 (134)
     Index I_BIGINT2_ID (0)
     Root page: 37193, depth: 3, leaf buckets: 739, nodes: 1000000
     Average node length: 5.96, total dup: 0, max dup: 0
     Average key length: 3.10, compression ratio: 1.59
     Average prefix length: 3.87, average data length: 1.05
     Index I_BIGINT2_COL (1)
     Root page: 37215, depth: 2, leaf buckets: 697, nodes: 1000000
     Average node length: 5.61, total dup: 367845, max dup: 8
     Average key length: 2.76, compression ratio: 1.79
     Average prefix length: 4.23, average data length: 0.70
Storage surely looks good, but internal conversions to Decimal128 are not free. I've done a few artificial tests (index scan in theloop with many iterations, CPU bound as tables fit the page cache), here are the results:
(1) Wide range scan:
   T_INT = 3.53s
   T_BIGINT = 3.82s
   T_INT128 = 4.36s
   T_BIGINT2 = 3.72s
More or less what's expected: different number of pages are accessed, so T_BIGINT2 wins as compared to T_BIGINT. Interesting thatT_INT128 has the same index size, but is noticeably slower. Due to a bigger computational overhead?
(2) Narrow range scan:
   T_INT = 2.74s
   T_BIGINT = 2.79s
   T_INT128 = 2.83s
   T_BIGINT2 = 2.76s

Almost the same results for both BIGINT compressions.

(3) Unique scan:
   T_INT = 1.83s
   T_BIGINT = 1.89s
   T_INT128 = 2.09s
   T_BIGINT2 = 2.05s

This is the worst case for T_BIGINT2, it loses ~10% to T_BIGINT.
In cases (2) and (3) index size does not matter, only index depth matters. So the stats reflects CPU overhead of btr.cpp::compress()and data type conversions.
It's also likely that if BIGINT is used as PK and currently its index is 4 levels depth, the new storage will shrink it down to 3levels and this could noticeably improve both lookup and (especially) join performance.
Does anyone think we should investigate this possibility more closely? Any 
other comments?


  I'd vote to remove idx_numeric2 in favour of idx_decimal and look how to 
improve
Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal
digits into 10 bits using relatively complex (computationally) algorithm with
shifts and multiplications. Storing every digit in 4 bits will use 2 bits per 3
digits more but should save come CPU circles, I believe.

Regards,
Vlad


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Re: [Firebird-devel] INT64 and index keys

Reply via email to