Re: [Firebird-devel] INT64 and index keys

2022-02-24 Thread Alex Peshkoff via Firebird-devel

On 2/15/22 18:20, Vlad Khorsun wrote:

I'd vote to remove idx_numeric2 in favour of idx_decimal and look how 
to improve
Decimal128::makeIndexKey() performance. For example, it stores every 3 
decimal
digits into 10 bits using relatively complex (computationally) 
algorithm with
shifts and multiplications. Storing every digit in 4 bits will use 2 
bits per 3

digits more but should save come CPU circles, I believe.



Yes, it does help - but not too much.
select dat from biTest order by dat;
(I reference sample from  #7133) becomes 1.1% faster with less dense 
encoding. Pay attention - engine does a lot of other job, i.e. speedup 
of this particular place is certainly much more.


Certainly we can use that performance increase for new index type. But I 
doubt it's worth breaking existing DECFLOAT indices.





Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-21 Thread Dmitry Yemanov

21.02.2022 20:05, Alex Peshkoff via Firebird-devel wrote:


This is possible way to fix a bug:
https://github.com/FirebirdSQL/firebird/commit/7dd832f32e9669bcb3007dc675b3ee7cca6f6b7d 


New type of indexes is added and it works fine.


I didn't look at the patch deeply yet, so the question: what is storage 
and performance difference between idx_decimal and idx_bcd?


But I wonder - what to do with existing databases? Rebuilding affected 
indexes is enough - but such database becomes unusable for previous 
versions of engine. Normally we should increase ODS minor. But last time 
when it was done in FB3 it gave serious negative feedback. Next, 
upgrading ODS requires gbak/restore cycle, which is problematic with big 
database.


I'd consider committing the solution into v5 only. In v4, index lookups 
should work, although suboptimal for very big numbers. As for index 
ordering, we may document the issue and leave it up to users. I doubt 
those who already utilized INT128 do really use such big numbers in 
production. If they do, they may add a +0 hint to work slower but safe. 
Or we may add a (temporary) configuration switch that disables the index 
ordering for INT128. Given that we expect v5 to be released soon, it 
could be acceptable as a temporary measure.


Perhaps we could invent something clever like converting an ORDER'ed 
select into UNION ALL with the first partition (<= 10^34) ordered by 
index and the second one sorted naturally, but it looks like too much 
effort for a temporary solution.



Dmitry


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-21 Thread Alex Peshkoff via Firebird-devel

On 2/16/22 13:49, Alex Peshkoff via Firebird-devel wrote:

On 2/16/22 12:54, Dmitry Yemanov wrote:

16.02.2022 12:19, Mark Rotteveel wrote:


However, that was not my main point. My main point was that it 
sounds like an index format that was created for supporting 
DECFLOAT(34), and that it is not suitable for the full range of 
INT128 and NUMERIC/DECIMAL backed by INT128 (for the same reasons 
the DOUBLE PRECISION format is not suitable for BIGINT and 
NUMERIC/DECIMAL backed by BIGINT).


It looks so. Unless we miss something (Alex?), 




This is possible way to fix a bug:
https://github.com/FirebirdSQL/firebird/commit/7dd832f32e9669bcb3007dc675b3ee7cca6f6b7d
New type of indexes is added and it works fine.

But I wonder - what to do with existing databases? Rebuilding affected 
indexes is enough - but such database becomes unusable for previous 
versions of engine. Normally we should increase ODS minor. But last time 
when it was done in FB3 it gave serious negative feedback. Next, 
upgrading ODS requires gbak/restore cycle, which is problematic with big 
database.





Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-16 Thread Alex Peshkoff via Firebird-devel

On 2/16/22 12:54, Dmitry Yemanov wrote:

16.02.2022 12:19, Mark Rotteveel wrote:


However, that was not my main point. My main point was that it sounds 
like an index format that was created for supporting DECFLOAT(34), 
and that it is not suitable for the full range of INT128 and 
NUMERIC/DECIMAL backed by INT128 (for the same reasons the DOUBLE 
PRECISION format is not suitable for BIGINT and NUMERIC/DECIMAL 
backed by BIGINT).


It looks so. Unless we miss something (Alex?), 


That was my fault. Initially it was decided to represent big numerics 
with decfloat. But that works very slow, and later it was replaced with 
BIGINT. But about index key formay I've just forgotten.


perhaps we need to add a runtime check that rejects key creation for 
INT128 values longer than 34 decimal digits. And find a better 
solution in some later release.


Normalize coefficient appropriately adjusting exponent like it's done 
for BCDs in DECFLOAT case, but here that can be done in binary form.





Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-16 Thread Alex Peshkoff via Firebird-devel

On 2/16/22 13:34, Dmitry Yemanov wrote:

16.02.2022 13:28, Dmitry Yemanov wrote:


It looks so. Unless we miss something (Alex?), perhaps we need to 
add a runtime check that rejects key creation for INT128 values 
longer than 34 decimal digits.


Thinking twice, an overflow error should already been raised when a 
longish INT128 is converted into DECFLOAT(34) before composing the 
index key.


No, it does not happen, that int128 is rounded.


So no problems at all, except suboptimal index scan (possibly false 
matches) for longish values?


At least ORDER BY via index should be broken for them, I suppose.


Yes.




Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-16 Thread Dmitry Yemanov

16.02.2022 13:28, Dmitry Yemanov wrote:


It looks so. Unless we miss something (Alex?), perhaps we need to 
add a runtime check that rejects key creation for INT128 values 
longer than 34 decimal digits.


Thinking twice, an overflow error should already been raised when a 
longish INT128 is converted into DECFLOAT(34) before composing the 
index key.


No, it does not happen, that int128 is rounded.


So no problems at all, except suboptimal index scan (possibly false 
matches) for longish values?


At least ORDER BY via index should be broken for them, I suppose.


Dmitry



Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-16 Thread Alex Peshkoff via Firebird-devel

On 2/16/22 13:28, Dmitry Yemanov wrote:

16.02.2022 13:24, Alex Peshkoff via Firebird-devel wrote:


It looks so. Unless we miss something (Alex?), perhaps we need to 
add a runtime check that rejects key creation for INT128 values 
longer than 34 decimal digits.


Thinking twice, an overflow error should already been raised when a 
longish INT128 is converted into DECFLOAT(34) before composing the 
index key.


No, it does not happen, that int128 is rounded.


So no problems at all, except suboptimal index scan (possibly false 
matches) for longish values?




Not exactly. When index is used in order by it may produce wrong sorts.




Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-16 Thread Dmitry Yemanov

16.02.2022 13:24, Alex Peshkoff via Firebird-devel wrote:


It looks so. Unless we miss something (Alex?), perhaps we need to add 
a runtime check that rejects key creation for INT128 values longer 
than 34 decimal digits.


Thinking twice, an overflow error should already been raised when a 
longish INT128 is converted into DECFLOAT(34) before composing the 
index key.


No, it does not happen, that int128 is rounded.


So no problems at all, except suboptimal index scan (possibly false 
matches) for longish values?



Dmitry


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-16 Thread Alex Peshkoff via Firebird-devel

On 2/16/22 12:59, Dmitry Yemanov wrote:

16.02.2022 12:54, Dmitry Yemanov wrote:


It looks so. Unless we miss something (Alex?), perhaps we need to add 
a runtime check that rejects key creation for INT128 values longer 
than 34 decimal digits.


Thinking twice, an overflow error should already been raised when a 
longish INT128 is converted into DECFLOAT(34) before composing the 
index key.




No, it does not happen, that int128 is rounded.



Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-16 Thread Alex Peshkoff via Firebird-devel

On 2/16/22 12:45, Alex Peshkoff via Firebird-devel wrote:

To be precise - 9 digits into ULONG. No shifts (BTW what a problem 
with them, CPUs have appropriate fast support since first pentium or 
even more). There is one division of small integer (range from 0 to 
33). It can be easily replaced with table lookup but I'm not sure is 
it worth doing or not - small numbers division if fast enough.




Sorry - that's all about sort, not index key.




Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-16 Thread Vlad Khorsun

16.02.2022 11:45, Alex Peshkoff via Firebird-devel wrote:

On 2/15/22 18:20, Vlad Khorsun wrote:

I'd vote to remove idx_numeric2 in favour of idx_decimal 


Appears to be good idea.


and look how to improve
Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal
digits into 10 bits using relatively complex (computationally) algorithm with
shifts and multiplications. 


To be precise - 9 digits into ULONG. No shifts (BTW what a problem with them, CPUs have appropriate fast support since first pentium 
or even more). There is one division of small integer (range from 0 to 33). It can be easily replaced with table lookup but I'm not 
sure is it worth doing or not - small numbers division if fast enough.


  I refers to this piece of code:

// compress coeff - 3 decimal digits (999) per 10 bits (1023)
unsigned char* p = coeff;
for (ShiftTable* t = table; p < end; p += 3)
{
USHORT val = p[0] * 100 + p[1] * 10 + p[2];
fb_assert(val < 1000);   // 1024, 10 bit
*k |= (val >> t->rshift);
++k;
*k = (val << t->lshift);
if (!t->lshift)
{
++k;
*k = 0;
t = table;
}
else
++t;
}



For Decimal34 use of packed BCD will cause growth of index key (+1 DWORD). I doubt that can make overall result better. May be we 
can move 2 digits in the end of exponent's DWORD, in that case we  do not loose in key size. But is that faster or slower compared 
with current should be checked.


For Decimal16 I see no such problems - packed BCD can be used w/o problems.


Storing every digit in 4 bits will use 2 bits per 3
digits more but should save come CPU circles, I believe.



There is also extracting of BCD from internal decfloat representation. Also not 
too fast process.


  Yes, I have same concerns. We could extract packed BCD from decFloat, btw, 
but it
seems as not very convenient for us (see decDoubleToPacked\decQuadToPacked):

pack receives DECDOUBLE_Pmax packed decimal digits (one digit per four-bit 
nibble)
followed by a sign nibble and prefixed (for decDouble and decQuad only) with an 
extra
pad nibble (which is 0).

Regards,
Vlad


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-16 Thread Dmitry Yemanov

16.02.2022 12:54, Dmitry Yemanov wrote:


It looks so. Unless we miss something (Alex?), perhaps we need to add a 
runtime check that rejects key creation for INT128 values longer than 34 
decimal digits.


Thinking twice, an overflow error should already been raised when a 
longish INT128 is converted into DECFLOAT(34) before composing the index 
key.



Dmitry


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-16 Thread Dmitry Yemanov

16.02.2022 12:19, Mark Rotteveel wrote:


However, that was not my main point. My main point was that it sounds 
like an index format that was created for supporting DECFLOAT(34), and 
that it is not suitable for the full range of INT128 and NUMERIC/DECIMAL 
backed by INT128 (for the same reasons the DOUBLE PRECISION format is 
not suitable for BIGINT and NUMERIC/DECIMAL backed by BIGINT).


It looks so. Unless we miss something (Alex?), perhaps we need to add a 
runtime check that rejects key creation for INT128 values longer than 34 
decimal digits. And find a better solution in some later release.



Dmitry


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-16 Thread Alex Peshkoff via Firebird-devel

On 2/15/22 18:20, Vlad Khorsun wrote:

I'd vote to remove idx_numeric2 in favour of idx_decimal 


Appears to be good idea.


and look how to improve
Decimal128::makeIndexKey() performance. For example, it stores every 3 
decimal
digits into 10 bits using relatively complex (computationally) 
algorithm with
shifts and multiplications. 


To be precise - 9 digits into ULONG. No shifts (BTW what a problem with 
them, CPUs have appropriate fast support since first pentium or even 
more). There is one division of small integer (range from 0 to 33). It 
can be easily replaced with table lookup but I'm not sure is it worth 
doing or not - small numbers division if fast enough.


For Decimal34 use of packed BCD will cause growth of index key (+1 
DWORD). I doubt that can make overall result better. May be we can move 
2 digits in the end of exponent's DWORD, in that case we  do not loose 
in key size. But is that faster or slower compared with current should 
be checked.


For Decimal16 I see no such problems - packed BCD can be used w/o problems.


Storing every digit in 4 bits will use 2 bits per 3
digits more but should save come CPU circles, I believe.



There is also extracting of BCD from internal decfloat representation. 
Also not too fast process.





Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-16 Thread Vlad Khorsun

16.02.2022 11:19, Mark Rotteveel wrote:

On 2022-02-16 09:56, Vlad Khorsun wrote:

16.02.2022 10:35, Mark Rotteveel wrote:

On 2022-02-15 20:08, Vlad Khorsun wrote:

15.02.2022 20:32, Mark Rotteveel wrote:

On 2022-02-15 16:20, Vlad Khorsun wrote:

  I'd vote to remove idx_numeric2 in favour of idx_decimal and look
how to improve
Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal
digits into 10 bits using relatively complex (computationally) algorithm with
shifts and multiplications. Storing every digit in 4 bits will use 2 bits per 3
digits more but should save come CPU circles, I believe.


But that is the storage format of Decimal128 (DECFLOAT(34)).


  No. It looks more like packed BCD. DECFLOAT uses a more complex internals,
it is not BCD inside.


Decimal128 uses densely packed BCD (with some oddities to encode sign, exponent 
and first digit).


  It is not BCD and definitely not what uses Decimal128::makeIndexKey().
See http://speleotrove.com/decimal/dbspec.html
and http://speleotrove.com/decimal/DPDecimal.html

Also, note, Decimal encodings is not directly sortable.


 From that first link under 'coefficient continuation': "Each 10-bit group represents three decimal digits, using Densely Packed 
Decimal encoding." 


  Sure. And it is not BCD, nor packed BCD ;)

and at the footnote (and the top of your second link): "Chen-Ho encoding is a lossless compression of three 
Binary Coded Decimal digits into 10 bits using an algorithm which can be applied or reversed using only simple Boolean operations".


  In any case - encoding used by Decimal128::makeIndexKey() is not the
same encoding as used by Decimal internally.

However, that was not my main point. My main point was that it sounds like an index format that was created for supporting 
DECFLOAT(34), and that it is not suitable for the full range of INT128 and NUMERIC/DECIMAL backed by INT128 (for the same reasons 
the DOUBLE PRECISION format is not suitable for BIGINT and NUMERIC/DECIMAL backed by BIGINT).


  And I never object it. Note, I offer to remove idx_numeric2 (used for INT64).

Regards,
Vlad


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-16 Thread Mark Rotteveel

On 2022-02-16 09:56, Vlad Khorsun wrote:

16.02.2022 10:35, Mark Rotteveel wrote:

On 2022-02-15 20:08, Vlad Khorsun wrote:

15.02.2022 20:32, Mark Rotteveel wrote:

On 2022-02-15 16:20, Vlad Khorsun wrote:

  I'd vote to remove idx_numeric2 in favour of idx_decimal and look
how to improve
Decimal128::makeIndexKey() performance. For example, it stores 
every 3 decimal
digits into 10 bits using relatively complex (computationally) 
algorithm with
shifts and multiplications. Storing every digit in 4 bits will use 
2 bits per 3

digits more but should save come CPU circles, I believe.


But that is the storage format of Decimal128 (DECFLOAT(34)).


  No. It looks more like packed BCD. DECFLOAT uses a more complex 
internals,

it is not BCD inside.


Decimal128 uses densely packed BCD (with some oddities to encode sign, 
exponent and first digit).


  It is not BCD and definitely not what uses 
Decimal128::makeIndexKey().

See http://speleotrove.com/decimal/dbspec.html
and http://speleotrove.com/decimal/DPDecimal.html

Also, note, Decimal encodings is not directly sortable.


From that first link under 'coefficient continuation': "Each 10-bit 
group represents three decimal digits, using Densely Packed Decimal 
encoding." and at the footnote (and the top of your second link): 
"Chen-Ho encoding is a lossless compression of three Binary Coded 
Decimal digits into 10 bits using an algorithm which can be applied or 
reversed using only simple Boolean operations".


However, that was not my main point. My main point was that it sounds 
like an index format that was created for supporting DECFLOAT(34), and 
that it is not suitable for the full range of INT128 and NUMERIC/DECIMAL 
backed by INT128 (for the same reasons the DOUBLE PRECISION format is 
not suitable for BIGINT and NUMERIC/DECIMAL backed by BIGINT).


Mark


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-16 Thread Vlad Khorsun

16.02.2022 10:35, Mark Rotteveel wrote:

On 2022-02-15 20:08, Vlad Khorsun wrote:

15.02.2022 20:32, Mark Rotteveel wrote:

On 2022-02-15 16:20, Vlad Khorsun wrote:

  I'd vote to remove idx_numeric2 in favour of idx_decimal and look
how to improve
Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal
digits into 10 bits using relatively complex (computationally) algorithm with
shifts and multiplications. Storing every digit in 4 bits will use 2 bits per 3
digits more but should save come CPU circles, I believe.


But that is the storage format of Decimal128 (DECFLOAT(34)).


  No. It looks more like packed BCD. DECFLOAT uses a more complex internals,
it is not BCD inside.


Decimal128 uses densely packed BCD (with some oddities to encode sign, exponent 
and first digit).


  It is not BCD and definitely not what uses Decimal128::makeIndexKey().
See http://speleotrove.com/decimal/dbspec.html
and http://speleotrove.com/decimal/DPDecimal.html

Also, note, Decimal encodings is not directly sortable.

Regards,
Vlad


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-16 Thread Mark Rotteveel

On 2022-02-15 20:08, Vlad Khorsun wrote:

15.02.2022 20:32, Mark Rotteveel wrote:

On 2022-02-15 16:20, Vlad Khorsun wrote:

  I'd vote to remove idx_numeric2 in favour of idx_decimal and look
how to improve
Decimal128::makeIndexKey() performance. For example, it stores every 
3 decimal
digits into 10 bits using relatively complex (computationally) 
algorithm with
shifts and multiplications. Storing every digit in 4 bits will use 2 
bits per 3

digits more but should save come CPU circles, I believe.


But that is the storage format of Decimal128 (DECFLOAT(34)).


  No. It looks more like packed BCD. DECFLOAT uses a more complex 
internals,

it is not BCD inside.


Decimal128 uses densely packed BCD (with some oddities to encode sign, 
exponent and first digit).


Mark


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-15 Thread Vlad Khorsun

15.02.2022 20:32, Mark Rotteveel wrote:

On 2022-02-15 16:20, Vlad Khorsun wrote:

  I'd vote to remove idx_numeric2 in favour of idx_decimal and look
how to improve
Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal
digits into 10 bits using relatively complex (computationally) algorithm with
shifts and multiplications. Storing every digit in 4 bits will use 2 bits per 3
digits more but should save come CPU circles, I believe.


But that is the storage format of Decimal128 (DECFLOAT(34)). 


  No. It looks more like packed BCD. DECFLOAT uses a more complex internals,
it is not BCD inside.

Regards,
Vlad


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-15 Thread Mark Rotteveel

On 2022-02-15 16:20, Vlad Khorsun wrote:

  I'd vote to remove idx_numeric2 in favour of idx_decimal and look
how to improve
Decimal128::makeIndexKey() performance. For example, it stores every 3 
decimal
digits into 10 bits using relatively complex (computationally) 
algorithm with
shifts and multiplications. Storing every digit in 4 bits will use 2 
bits per 3

digits more but should save come CPU circles, I believe.


But that is the storage format of Decimal128 (DECFLOAT(34)). It makes no 
sense to replace that with something else for DECFLOAT(34) columns. So 
maybe a different index format is necessary for INT128 (and NUMERIC(38, 
...), because Decimal128 doesn't sound suitable for those types as it 
has less precision (34 digits instead of 38) (similar like the double 
precision index format is not suitable for INT64/BIGINT).


Mark


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-15 Thread Dimitry Sibiryakov

Dmitry Yemanov wrote 15.02.2022 18:14:
Your schema upgrade may change keys from INT to BIGINT (when tables grow more 
than expected) or, more usually -- NUMERIC(N, 3) is changed to NUMERIC(N, 5) for 
quantities, or NUMERIC(12, N) is changed to NUMERIC(18, N), etc. And it should 
be applied to production. And a longish index rebuild will block all other DML 
on that table.


  Thanks to prefix compression we can use 16 bytes key for any precise numeric 
type without drawback so changing of precision wouldn't be an issue, right?

  Scale is another story.

--
  WBR, SD.


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-15 Thread Dmitry Yemanov

15.02.2022 18:37, Kjell Rilbe wrote:


With respect, since I'm not in the dev team, I fail to see it as an 
important feature to avoid index rebuild when changing between integer 
and numeric column types:


1. Changing between such types must be pretty rare, at least on 
production databases. During app development, yes, sure, but at that 
stage the data is probably small or non existent, so rebuild won't 
matter anyway.


Your schema upgrade may change keys from INT to BIGINT (when tables grow 
more than expected) or, more usually -- NUMERIC(N, 3) is changed to 
NUMERIC(N, 5) for quantities, or NUMERIC(12, N) is changed to 
NUMERIC(18, N), etc. And it should be applied to production. And a 
longish index rebuild will block all other DML on that table.


Maybe not so common, usually not recommended, etc -- but surely possible.


2. It already doesn't work and nobody has really complained, correct?


Correct, but it does not mean we cannot do something better. Not doing 
something is always faster than doing something ;-) so why not?


3. Changing data type of an indexed column is an operation where you do 
expect a non-negligible impact, i.e. you expect the DB engine to have to 
do "some work".


Users of other DBMS usually expect to see "some work" while adding a 
column to a table with 10^9 rows (or dropping a column there). Firebird 
does it almost instantly. So expectations are somewhat subjective and 
may be out of sync with realities...


So, I'd vote for a solution that drops the ambition to avoid index 
rebuilds, and instead focuses only on index size and index performance.


We can (and I believe we should) do both -- improve index 
size/performance and avoid index rebuilds if possible.



Dmitry


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-15 Thread Kjell Rilbe

Den 2022-02-15 kl. 16:10, skrev Vlad Khorsun:

15.02.2022 15:20, Dmitry Yemanov wrote:

...


I can think of two ways to proceed:

1) Keep rebuilding the index every time the type is changed. Cast all 
column values into the new format before storing keys into the index. 
Forget about being scale-independent, pick the scale from the latest 
table format (and store BIGINTs or all exact numerics as integers 
without scale, as you suggest).


2) Prohibit decreasing the scale. Avoid rebuilding the indices unless 
really necessary (idx_itype is changed). Consider it a good thing (tm).


Maybe there may be other options, I didn't think deeply about this.


  2+)  Rebuild index only if scale was decreased. Add optional clause 
at SQL syntax level
to explicitly [dis]allow this. If scale was not decreased - leave 
index unchanged.




With respect, since I'm not in the dev team, I fail to see it as an 
important feature to avoid index rebuild when changing between integer 
and numeric column types:


1. Changing between such types must be pretty rare, at least on 
production databases. During app development, yes, sure, but at that 
stage the data is probably small or non existent, so rebuild won't 
matter anyway.


2. It already doesn't work and nobody has really complained, correct?

3. Changing data type of an indexed column is an operation where you do 
expect a non-negligible impact, i.e. you expect the DB engine to have to 
do "some work".


So, I'd vote for a solution that drops the ambition to avoid index 
rebuilds, and instead focuses only on index size and index performance.


Just m.h.o.

I do have a production database with a couple of tables containing a few 
hundred million records using bigint primary keys, and all tables use 
bigint keys all over the place, so I think I would benefit from 
improvements there. :-)


Regards,
Kjell
begin:vcard
fn:Kjell Rilbe
n:Rilbe;Kjell
org:Marknadsinformation i Sverige AB;Utveckling & databaser
adr;quoted-printable:;;Ulvsundav=C3=A4gen 106C;Bromma;Stockholm;16867;Sverige
email;internet:kjell.ri...@marknadsinformation.se
title:Utvecklings- & databasansvarig
tel;cell:0733-442464
x-mozilla-html:TRUE
url:http://www.marknadsinformation.se
version:2.1
end:vcard

Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-15 Thread Vlad Khorsun

15.02.2022 13:28, Dmitry Yemanov wrote:

All,

Historically, we store most numerics as double precision inside index keys. It makes stored keys independent of the declared scale 
and allows both prefix and suffix compression. However, int64 keys (BIGINT and largish NUMERICs/DECIMALs) do not fit the double 
format without risk of rounding/truncation, so they're stored differently -- normalized to the maximum value that fits into double 
and then re-scaled. The problem is that this format is not well-compressable.


...

Maybe we couldn't do any better before FB4, but now we have INT128 and DECFLOAT which use more effective index key format, so I 
asked myself whether it makes sense to utilize it for int64 values too.


Below is the stats if BIGINT would be stored as idx_decimal in the index:

T_BIGINT2 (134)
     Index I_BIGINT2_ID (0)
 Root page: 37193, depth: 3, leaf buckets: 739, nodes: 100
 Average node length: 5.96, total dup: 0, max dup: 0
 Average key length: 3.10, compression ratio: 1.59
 Average prefix length: 3.87, average data length: 1.05
     Index I_BIGINT2_COL (1)
 Root page: 37215, depth: 2, leaf buckets: 697, nodes: 100
 Average node length: 5.61, total dup: 367845, max dup: 8
 Average key length: 2.76, compression ratio: 1.79
 Average prefix length: 4.23, average data length: 0.70

Storage surely looks good, but internal conversions to Decimal128 are not free. I've done a few artificial tests (index scan in the 
loop with many iterations, CPU bound as tables fit the page cache), here are the results:


(1) Wide range scan:
   T_INT = 3.53s
   T_BIGINT = 3.82s
   T_INT128 = 4.36s
   T_BIGINT2 = 3.72s

More or less what's expected: different number of pages are accessed, so T_BIGINT2 wins as compared to T_BIGINT. Interesting that 
T_INT128 has the same index size, but is noticeably slower. Due to a bigger computational overhead?


(2) Narrow range scan:
   T_INT = 2.74s
   T_BIGINT = 2.79s
   T_INT128 = 2.83s
   T_BIGINT2 = 2.76s

Almost the same results for both BIGINT compressions.

(3) Unique scan:
   T_INT = 1.83s
   T_BIGINT = 1.89s
   T_INT128 = 2.09s
   T_BIGINT2 = 2.05s

This is the worst case for T_BIGINT2, it loses ~10% to T_BIGINT.

In cases (2) and (3) index size does not matter, only index depth matters. So the stats reflects CPU overhead of btr.cpp::compress() 
and data type conversions.


It's also likely that if BIGINT is used as PK and currently its index is 4 levels depth, the new storage will shrink it down to 3 
levels and this could noticeably improve both lookup and (especially) join performance.


Does anyone think we should investigate this possibility more closely? Any 
other comments?


  I'd vote to remove idx_numeric2 in favour of idx_decimal and look how to 
improve
Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal
digits into 10 bits using relatively complex (computationally) algorithm with
shifts and multiplications. Storing every digit in 4 bits will use 2 bits per 3
digits more but should save come CPU circles, I believe.

Regards,
Vlad


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-15 Thread Vlad Khorsun

15.02.2022 15:20, Dmitry Yemanov wrote:

...


I can think of two ways to proceed:

1) Keep rebuilding the index every time the type is changed. Cast all column values into the new format before storing keys into the 
index. Forget about being scale-independent, pick the scale from the latest table format (and store BIGINTs or all exact numerics as 
integers without scale, as you suggest).


2) Prohibit decreasing the scale. Avoid rebuilding the indices unless really necessary (idx_itype is changed). Consider it a good 
thing (tm).


Maybe there may be other options, I didn't think deeply about this.


  2+)  Rebuild index only if scale was decreased. Add optional clause at SQL 
syntax level
to explicitly [dis]allow this. If scale was not decreased - leave index 
unchanged.

Regards,
Vlad


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-15 Thread Dmitry Yemanov

15.02.2022 14:41, Dimitry Sibiryakov wrote:


Is "stored keys independent of the declared scale" a really useful 
and used feature?


That's a separate but also good question. Double precision keys allow 
independence of both precision (within its range) and scale. AFAIU, the 
idea was to allow changing column data types between 
FLOAT/DOUBLE/SMALLINT/INT and between different NUMERIC/DECIMAL 
precisions/scales without rebuilding the index (i.e. keys in the old and 
new formats may co-exist).


But then surprises start to appear.

First of all, it does not work this way. We rebuild an index every time 
a column's data type is changed, regardless of the old/new precision and 
scale. Even if the field is altered to the same data type. And AFAIK, 
it's not something we changed in Firebird, the original IB6 code already 
worked this way.


Second, co-existence of keys in the old and new formats is also buggy 
now, see below.


recreate table t_scale (col1 numeric(5, 2), col2 numeric(5, 2));
create index t_scale_i1 on t_scale(col1);
create index t_scale_i2 on t_scale(col2);
commit;

insert into t_scale values (1.23, 1.23);
commit;

select * from t_scale;

COL1 COL2
 
1.23 1.23

select * from t_scale where col1 = 1.23;

COL1 COL2
 
1.23 1.23

select * from t_scale where col2 = 1.23;

COL1 COL2
 
1.23 1.23

alter table t_scale alter col1 type numeric(6, 3);
alter table t_scale alter col2 type numeric(5, 1);
commit;

select * from t_scale;

COL1 COL2
 
   1.230  1.2

select * from t_scale where col1 = 1.23;

COL1 COL2
 
   1.230  1.2

select * from t_scale where col1 = 1.230;

COL1 COL2
 
   1.230  1.2

-- so far so good

select * from t_scale where col2 = 1.2;

-- NOTHING! WTF?

select * from t_scale where col2 = 1.23;

-- ALSO NOTHING!

We cannot access the existing value via the index anymore. This happens 
because (even after index rebuild) the stored key belongs to the old 
format, i.e. 1.23. It cannot be matched while searching for 1.2. It is 
matched while searching for 1.23, but then the record is read, its 
stored value 1.23 (old format) is converted into 1.2 (new format) and 
then re-compared with 1.23 thus being false again.


I can think of two ways to proceed:

1) Keep rebuilding the index every time the type is changed. Cast all 
column values into the new format before storing keys into the index. 
Forget about being scale-independent, pick the scale from the latest 
table format (and store BIGINTs or all exact numerics as integers 
without scale, as you suggest).


2) Prohibit decreasing the scale. Avoid rebuilding the indices unless 
really necessary (idx_itype is changed). Consider it a good thing (tm).


Maybe there may be other options, I didn't think deeply about this.


Dmitry


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-15 Thread Adriano dos Santos Fernandes
On 15/02/2022 08:28, Dmitry Yemanov wrote:
>
> Any other comments?
> 

Would not it be possible to use some form of varint encoding (VLQ) there?


Adriano


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel


Re: [Firebird-devel] INT64 and index keys

2022-02-15 Thread Dimitry Sibiryakov

Dmitry Yemanov wrote 15.02.2022 12:28:

Does anyone think we should investigate this possibility more closely? Any other 
comments?


  Is "stored keys independent of the declared scale" a really useful and used 
feature?
  If we prohibit that, the index key can be a plain integer without rounding 
and false positive comparison but with perfect prefix compression.


--
  WBR, SD.


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel