Re: [Firebird-devel] INT64 and index keys
On 2/15/22 18:20, Vlad Khorsun wrote: I'd vote to remove idx_numeric2 in favour of idx_decimal and look how to improve Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal digits into 10 bits using relatively complex (computationally) algorithm with shifts and multiplications. Storing every digit in 4 bits will use 2 bits per 3 digits more but should save come CPU circles, I believe. Yes, it does help - but not too much. select dat from biTest order by dat; (I reference sample from #7133) becomes 1.1% faster with less dense encoding. Pay attention - engine does a lot of other job, i.e. speedup of this particular place is certainly much more. Certainly we can use that performance increase for new index type. But I doubt it's worth breaking existing DECFLOAT indices. Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
21.02.2022 20:05, Alex Peshkoff via Firebird-devel wrote: This is possible way to fix a bug: https://github.com/FirebirdSQL/firebird/commit/7dd832f32e9669bcb3007dc675b3ee7cca6f6b7d New type of indexes is added and it works fine. I didn't look at the patch deeply yet, so the question: what is storage and performance difference between idx_decimal and idx_bcd? But I wonder - what to do with existing databases? Rebuilding affected indexes is enough - but such database becomes unusable for previous versions of engine. Normally we should increase ODS minor. But last time when it was done in FB3 it gave serious negative feedback. Next, upgrading ODS requires gbak/restore cycle, which is problematic with big database. I'd consider committing the solution into v5 only. In v4, index lookups should work, although suboptimal for very big numbers. As for index ordering, we may document the issue and leave it up to users. I doubt those who already utilized INT128 do really use such big numbers in production. If they do, they may add a +0 hint to work slower but safe. Or we may add a (temporary) configuration switch that disables the index ordering for INT128. Given that we expect v5 to be released soon, it could be acceptable as a temporary measure. Perhaps we could invent something clever like converting an ORDER'ed select into UNION ALL with the first partition (<= 10^34) ordered by index and the second one sorted naturally, but it looks like too much effort for a temporary solution. Dmitry Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
On 2/16/22 13:49, Alex Peshkoff via Firebird-devel wrote: On 2/16/22 12:54, Dmitry Yemanov wrote: 16.02.2022 12:19, Mark Rotteveel wrote: However, that was not my main point. My main point was that it sounds like an index format that was created for supporting DECFLOAT(34), and that it is not suitable for the full range of INT128 and NUMERIC/DECIMAL backed by INT128 (for the same reasons the DOUBLE PRECISION format is not suitable for BIGINT and NUMERIC/DECIMAL backed by BIGINT). It looks so. Unless we miss something (Alex?), This is possible way to fix a bug: https://github.com/FirebirdSQL/firebird/commit/7dd832f32e9669bcb3007dc675b3ee7cca6f6b7d New type of indexes is added and it works fine. But I wonder - what to do with existing databases? Rebuilding affected indexes is enough - but such database becomes unusable for previous versions of engine. Normally we should increase ODS minor. But last time when it was done in FB3 it gave serious negative feedback. Next, upgrading ODS requires gbak/restore cycle, which is problematic with big database. Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
On 2/16/22 12:54, Dmitry Yemanov wrote: 16.02.2022 12:19, Mark Rotteveel wrote: However, that was not my main point. My main point was that it sounds like an index format that was created for supporting DECFLOAT(34), and that it is not suitable for the full range of INT128 and NUMERIC/DECIMAL backed by INT128 (for the same reasons the DOUBLE PRECISION format is not suitable for BIGINT and NUMERIC/DECIMAL backed by BIGINT). It looks so. Unless we miss something (Alex?), That was my fault. Initially it was decided to represent big numerics with decfloat. But that works very slow, and later it was replaced with BIGINT. But about index key formay I've just forgotten. perhaps we need to add a runtime check that rejects key creation for INT128 values longer than 34 decimal digits. And find a better solution in some later release. Normalize coefficient appropriately adjusting exponent like it's done for BCDs in DECFLOAT case, but here that can be done in binary form. Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
On 2/16/22 13:34, Dmitry Yemanov wrote: 16.02.2022 13:28, Dmitry Yemanov wrote: It looks so. Unless we miss something (Alex?), perhaps we need to add a runtime check that rejects key creation for INT128 values longer than 34 decimal digits. Thinking twice, an overflow error should already been raised when a longish INT128 is converted into DECFLOAT(34) before composing the index key. No, it does not happen, that int128 is rounded. So no problems at all, except suboptimal index scan (possibly false matches) for longish values? At least ORDER BY via index should be broken for them, I suppose. Yes. Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
16.02.2022 13:28, Dmitry Yemanov wrote: It looks so. Unless we miss something (Alex?), perhaps we need to add a runtime check that rejects key creation for INT128 values longer than 34 decimal digits. Thinking twice, an overflow error should already been raised when a longish INT128 is converted into DECFLOAT(34) before composing the index key. No, it does not happen, that int128 is rounded. So no problems at all, except suboptimal index scan (possibly false matches) for longish values? At least ORDER BY via index should be broken for them, I suppose. Dmitry Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
On 2/16/22 13:28, Dmitry Yemanov wrote: 16.02.2022 13:24, Alex Peshkoff via Firebird-devel wrote: It looks so. Unless we miss something (Alex?), perhaps we need to add a runtime check that rejects key creation for INT128 values longer than 34 decimal digits. Thinking twice, an overflow error should already been raised when a longish INT128 is converted into DECFLOAT(34) before composing the index key. No, it does not happen, that int128 is rounded. So no problems at all, except suboptimal index scan (possibly false matches) for longish values? Not exactly. When index is used in order by it may produce wrong sorts. Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
16.02.2022 13:24, Alex Peshkoff via Firebird-devel wrote: It looks so. Unless we miss something (Alex?), perhaps we need to add a runtime check that rejects key creation for INT128 values longer than 34 decimal digits. Thinking twice, an overflow error should already been raised when a longish INT128 is converted into DECFLOAT(34) before composing the index key. No, it does not happen, that int128 is rounded. So no problems at all, except suboptimal index scan (possibly false matches) for longish values? Dmitry Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
On 2/16/22 12:59, Dmitry Yemanov wrote: 16.02.2022 12:54, Dmitry Yemanov wrote: It looks so. Unless we miss something (Alex?), perhaps we need to add a runtime check that rejects key creation for INT128 values longer than 34 decimal digits. Thinking twice, an overflow error should already been raised when a longish INT128 is converted into DECFLOAT(34) before composing the index key. No, it does not happen, that int128 is rounded. Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
On 2/16/22 12:45, Alex Peshkoff via Firebird-devel wrote: To be precise - 9 digits into ULONG. No shifts (BTW what a problem with them, CPUs have appropriate fast support since first pentium or even more). There is one division of small integer (range from 0 to 33). It can be easily replaced with table lookup but I'm not sure is it worth doing or not - small numbers division if fast enough. Sorry - that's all about sort, not index key. Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
16.02.2022 11:45, Alex Peshkoff via Firebird-devel wrote: On 2/15/22 18:20, Vlad Khorsun wrote: I'd vote to remove idx_numeric2 in favour of idx_decimal Appears to be good idea. and look how to improve Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal digits into 10 bits using relatively complex (computationally) algorithm with shifts and multiplications. To be precise - 9 digits into ULONG. No shifts (BTW what a problem with them, CPUs have appropriate fast support since first pentium or even more). There is one division of small integer (range from 0 to 33). It can be easily replaced with table lookup but I'm not sure is it worth doing or not - small numbers division if fast enough. I refers to this piece of code: // compress coeff - 3 decimal digits (999) per 10 bits (1023) unsigned char* p = coeff; for (ShiftTable* t = table; p < end; p += 3) { USHORT val = p[0] * 100 + p[1] * 10 + p[2]; fb_assert(val < 1000); // 1024, 10 bit *k |= (val >> t->rshift); ++k; *k = (val << t->lshift); if (!t->lshift) { ++k; *k = 0; t = table; } else ++t; } For Decimal34 use of packed BCD will cause growth of index key (+1 DWORD). I doubt that can make overall result better. May be we can move 2 digits in the end of exponent's DWORD, in that case we do not loose in key size. But is that faster or slower compared with current should be checked. For Decimal16 I see no such problems - packed BCD can be used w/o problems. Storing every digit in 4 bits will use 2 bits per 3 digits more but should save come CPU circles, I believe. There is also extracting of BCD from internal decfloat representation. Also not too fast process. Yes, I have same concerns. We could extract packed BCD from decFloat, btw, but it seems as not very convenient for us (see decDoubleToPacked\decQuadToPacked): pack receives DECDOUBLE_Pmax packed decimal digits (one digit per four-bit nibble) followed by a sign nibble and prefixed (for decDouble and decQuad only) with an extra pad nibble (which is 0). Regards, Vlad Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
16.02.2022 12:54, Dmitry Yemanov wrote: It looks so. Unless we miss something (Alex?), perhaps we need to add a runtime check that rejects key creation for INT128 values longer than 34 decimal digits. Thinking twice, an overflow error should already been raised when a longish INT128 is converted into DECFLOAT(34) before composing the index key. Dmitry Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
16.02.2022 12:19, Mark Rotteveel wrote: However, that was not my main point. My main point was that it sounds like an index format that was created for supporting DECFLOAT(34), and that it is not suitable for the full range of INT128 and NUMERIC/DECIMAL backed by INT128 (for the same reasons the DOUBLE PRECISION format is not suitable for BIGINT and NUMERIC/DECIMAL backed by BIGINT). It looks so. Unless we miss something (Alex?), perhaps we need to add a runtime check that rejects key creation for INT128 values longer than 34 decimal digits. And find a better solution in some later release. Dmitry Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
On 2/15/22 18:20, Vlad Khorsun wrote: I'd vote to remove idx_numeric2 in favour of idx_decimal Appears to be good idea. and look how to improve Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal digits into 10 bits using relatively complex (computationally) algorithm with shifts and multiplications. To be precise - 9 digits into ULONG. No shifts (BTW what a problem with them, CPUs have appropriate fast support since first pentium or even more). There is one division of small integer (range from 0 to 33). It can be easily replaced with table lookup but I'm not sure is it worth doing or not - small numbers division if fast enough. For Decimal34 use of packed BCD will cause growth of index key (+1 DWORD). I doubt that can make overall result better. May be we can move 2 digits in the end of exponent's DWORD, in that case we do not loose in key size. But is that faster or slower compared with current should be checked. For Decimal16 I see no such problems - packed BCD can be used w/o problems. Storing every digit in 4 bits will use 2 bits per 3 digits more but should save come CPU circles, I believe. There is also extracting of BCD from internal decfloat representation. Also not too fast process. Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
16.02.2022 11:19, Mark Rotteveel wrote: On 2022-02-16 09:56, Vlad Khorsun wrote: 16.02.2022 10:35, Mark Rotteveel wrote: On 2022-02-15 20:08, Vlad Khorsun wrote: 15.02.2022 20:32, Mark Rotteveel wrote: On 2022-02-15 16:20, Vlad Khorsun wrote: I'd vote to remove idx_numeric2 in favour of idx_decimal and look how to improve Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal digits into 10 bits using relatively complex (computationally) algorithm with shifts and multiplications. Storing every digit in 4 bits will use 2 bits per 3 digits more but should save come CPU circles, I believe. But that is the storage format of Decimal128 (DECFLOAT(34)). No. It looks more like packed BCD. DECFLOAT uses a more complex internals, it is not BCD inside. Decimal128 uses densely packed BCD (with some oddities to encode sign, exponent and first digit). It is not BCD and definitely not what uses Decimal128::makeIndexKey(). See http://speleotrove.com/decimal/dbspec.html and http://speleotrove.com/decimal/DPDecimal.html Also, note, Decimal encodings is not directly sortable. From that first link under 'coefficient continuation': "Each 10-bit group represents three decimal digits, using Densely Packed Decimal encoding." Sure. And it is not BCD, nor packed BCD ;) and at the footnote (and the top of your second link): "Chen-Ho encoding is a lossless compression of three Binary Coded Decimal digits into 10 bits using an algorithm which can be applied or reversed using only simple Boolean operations". In any case - encoding used by Decimal128::makeIndexKey() is not the same encoding as used by Decimal internally. However, that was not my main point. My main point was that it sounds like an index format that was created for supporting DECFLOAT(34), and that it is not suitable for the full range of INT128 and NUMERIC/DECIMAL backed by INT128 (for the same reasons the DOUBLE PRECISION format is not suitable for BIGINT and NUMERIC/DECIMAL backed by BIGINT). And I never object it. Note, I offer to remove idx_numeric2 (used for INT64). Regards, Vlad Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
On 2022-02-16 09:56, Vlad Khorsun wrote: 16.02.2022 10:35, Mark Rotteveel wrote: On 2022-02-15 20:08, Vlad Khorsun wrote: 15.02.2022 20:32, Mark Rotteveel wrote: On 2022-02-15 16:20, Vlad Khorsun wrote: I'd vote to remove idx_numeric2 in favour of idx_decimal and look how to improve Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal digits into 10 bits using relatively complex (computationally) algorithm with shifts and multiplications. Storing every digit in 4 bits will use 2 bits per 3 digits more but should save come CPU circles, I believe. But that is the storage format of Decimal128 (DECFLOAT(34)). No. It looks more like packed BCD. DECFLOAT uses a more complex internals, it is not BCD inside. Decimal128 uses densely packed BCD (with some oddities to encode sign, exponent and first digit). It is not BCD and definitely not what uses Decimal128::makeIndexKey(). See http://speleotrove.com/decimal/dbspec.html and http://speleotrove.com/decimal/DPDecimal.html Also, note, Decimal encodings is not directly sortable. From that first link under 'coefficient continuation': "Each 10-bit group represents three decimal digits, using Densely Packed Decimal encoding." and at the footnote (and the top of your second link): "Chen-Ho encoding is a lossless compression of three Binary Coded Decimal digits into 10 bits using an algorithm which can be applied or reversed using only simple Boolean operations". However, that was not my main point. My main point was that it sounds like an index format that was created for supporting DECFLOAT(34), and that it is not suitable for the full range of INT128 and NUMERIC/DECIMAL backed by INT128 (for the same reasons the DOUBLE PRECISION format is not suitable for BIGINT and NUMERIC/DECIMAL backed by BIGINT). Mark Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
16.02.2022 10:35, Mark Rotteveel wrote: On 2022-02-15 20:08, Vlad Khorsun wrote: 15.02.2022 20:32, Mark Rotteveel wrote: On 2022-02-15 16:20, Vlad Khorsun wrote: I'd vote to remove idx_numeric2 in favour of idx_decimal and look how to improve Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal digits into 10 bits using relatively complex (computationally) algorithm with shifts and multiplications. Storing every digit in 4 bits will use 2 bits per 3 digits more but should save come CPU circles, I believe. But that is the storage format of Decimal128 (DECFLOAT(34)). No. It looks more like packed BCD. DECFLOAT uses a more complex internals, it is not BCD inside. Decimal128 uses densely packed BCD (with some oddities to encode sign, exponent and first digit). It is not BCD and definitely not what uses Decimal128::makeIndexKey(). See http://speleotrove.com/decimal/dbspec.html and http://speleotrove.com/decimal/DPDecimal.html Also, note, Decimal encodings is not directly sortable. Regards, Vlad Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
On 2022-02-15 20:08, Vlad Khorsun wrote: 15.02.2022 20:32, Mark Rotteveel wrote: On 2022-02-15 16:20, Vlad Khorsun wrote: I'd vote to remove idx_numeric2 in favour of idx_decimal and look how to improve Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal digits into 10 bits using relatively complex (computationally) algorithm with shifts and multiplications. Storing every digit in 4 bits will use 2 bits per 3 digits more but should save come CPU circles, I believe. But that is the storage format of Decimal128 (DECFLOAT(34)). No. It looks more like packed BCD. DECFLOAT uses a more complex internals, it is not BCD inside. Decimal128 uses densely packed BCD (with some oddities to encode sign, exponent and first digit). Mark Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
15.02.2022 20:32, Mark Rotteveel wrote: On 2022-02-15 16:20, Vlad Khorsun wrote: I'd vote to remove idx_numeric2 in favour of idx_decimal and look how to improve Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal digits into 10 bits using relatively complex (computationally) algorithm with shifts and multiplications. Storing every digit in 4 bits will use 2 bits per 3 digits more but should save come CPU circles, I believe. But that is the storage format of Decimal128 (DECFLOAT(34)). No. It looks more like packed BCD. DECFLOAT uses a more complex internals, it is not BCD inside. Regards, Vlad Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
On 2022-02-15 16:20, Vlad Khorsun wrote: I'd vote to remove idx_numeric2 in favour of idx_decimal and look how to improve Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal digits into 10 bits using relatively complex (computationally) algorithm with shifts and multiplications. Storing every digit in 4 bits will use 2 bits per 3 digits more but should save come CPU circles, I believe. But that is the storage format of Decimal128 (DECFLOAT(34)). It makes no sense to replace that with something else for DECFLOAT(34) columns. So maybe a different index format is necessary for INT128 (and NUMERIC(38, ...), because Decimal128 doesn't sound suitable for those types as it has less precision (34 digits instead of 38) (similar like the double precision index format is not suitable for INT64/BIGINT). Mark Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
Dmitry Yemanov wrote 15.02.2022 18:14: Your schema upgrade may change keys from INT to BIGINT (when tables grow more than expected) or, more usually -- NUMERIC(N, 3) is changed to NUMERIC(N, 5) for quantities, or NUMERIC(12, N) is changed to NUMERIC(18, N), etc. And it should be applied to production. And a longish index rebuild will block all other DML on that table. Thanks to prefix compression we can use 16 bytes key for any precise numeric type without drawback so changing of precision wouldn't be an issue, right? Scale is another story. -- WBR, SD. Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
15.02.2022 18:37, Kjell Rilbe wrote: With respect, since I'm not in the dev team, I fail to see it as an important feature to avoid index rebuild when changing between integer and numeric column types: 1. Changing between such types must be pretty rare, at least on production databases. During app development, yes, sure, but at that stage the data is probably small or non existent, so rebuild won't matter anyway. Your schema upgrade may change keys from INT to BIGINT (when tables grow more than expected) or, more usually -- NUMERIC(N, 3) is changed to NUMERIC(N, 5) for quantities, or NUMERIC(12, N) is changed to NUMERIC(18, N), etc. And it should be applied to production. And a longish index rebuild will block all other DML on that table. Maybe not so common, usually not recommended, etc -- but surely possible. 2. It already doesn't work and nobody has really complained, correct? Correct, but it does not mean we cannot do something better. Not doing something is always faster than doing something ;-) so why not? 3. Changing data type of an indexed column is an operation where you do expect a non-negligible impact, i.e. you expect the DB engine to have to do "some work". Users of other DBMS usually expect to see "some work" while adding a column to a table with 10^9 rows (or dropping a column there). Firebird does it almost instantly. So expectations are somewhat subjective and may be out of sync with realities... So, I'd vote for a solution that drops the ambition to avoid index rebuilds, and instead focuses only on index size and index performance. We can (and I believe we should) do both -- improve index size/performance and avoid index rebuilds if possible. Dmitry Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
Den 2022-02-15 kl. 16:10, skrev Vlad Khorsun: 15.02.2022 15:20, Dmitry Yemanov wrote: ... I can think of two ways to proceed: 1) Keep rebuilding the index every time the type is changed. Cast all column values into the new format before storing keys into the index. Forget about being scale-independent, pick the scale from the latest table format (and store BIGINTs or all exact numerics as integers without scale, as you suggest). 2) Prohibit decreasing the scale. Avoid rebuilding the indices unless really necessary (idx_itype is changed). Consider it a good thing (tm). Maybe there may be other options, I didn't think deeply about this. 2+) Rebuild index only if scale was decreased. Add optional clause at SQL syntax level to explicitly [dis]allow this. If scale was not decreased - leave index unchanged. With respect, since I'm not in the dev team, I fail to see it as an important feature to avoid index rebuild when changing between integer and numeric column types: 1. Changing between such types must be pretty rare, at least on production databases. During app development, yes, sure, but at that stage the data is probably small or non existent, so rebuild won't matter anyway. 2. It already doesn't work and nobody has really complained, correct? 3. Changing data type of an indexed column is an operation where you do expect a non-negligible impact, i.e. you expect the DB engine to have to do "some work". So, I'd vote for a solution that drops the ambition to avoid index rebuilds, and instead focuses only on index size and index performance. Just m.h.o. I do have a production database with a couple of tables containing a few hundred million records using bigint primary keys, and all tables use bigint keys all over the place, so I think I would benefit from improvements there. :-) Regards, Kjell begin:vcard fn:Kjell Rilbe n:Rilbe;Kjell org:Marknadsinformation i Sverige AB;Utveckling & databaser adr;quoted-printable:;;Ulvsundav=C3=A4gen 106C;Bromma;Stockholm;16867;Sverige email;internet:kjell.ri...@marknadsinformation.se title:Utvecklings- & databasansvarig tel;cell:0733-442464 x-mozilla-html:TRUE url:http://www.marknadsinformation.se version:2.1 end:vcard Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
15.02.2022 13:28, Dmitry Yemanov wrote: All, Historically, we store most numerics as double precision inside index keys. It makes stored keys independent of the declared scale and allows both prefix and suffix compression. However, int64 keys (BIGINT and largish NUMERICs/DECIMALs) do not fit the double format without risk of rounding/truncation, so they're stored differently -- normalized to the maximum value that fits into double and then re-scaled. The problem is that this format is not well-compressable. ... Maybe we couldn't do any better before FB4, but now we have INT128 and DECFLOAT which use more effective index key format, so I asked myself whether it makes sense to utilize it for int64 values too. Below is the stats if BIGINT would be stored as idx_decimal in the index: T_BIGINT2 (134) Index I_BIGINT2_ID (0) Root page: 37193, depth: 3, leaf buckets: 739, nodes: 100 Average node length: 5.96, total dup: 0, max dup: 0 Average key length: 3.10, compression ratio: 1.59 Average prefix length: 3.87, average data length: 1.05 Index I_BIGINT2_COL (1) Root page: 37215, depth: 2, leaf buckets: 697, nodes: 100 Average node length: 5.61, total dup: 367845, max dup: 8 Average key length: 2.76, compression ratio: 1.79 Average prefix length: 4.23, average data length: 0.70 Storage surely looks good, but internal conversions to Decimal128 are not free. I've done a few artificial tests (index scan in the loop with many iterations, CPU bound as tables fit the page cache), here are the results: (1) Wide range scan: T_INT = 3.53s T_BIGINT = 3.82s T_INT128 = 4.36s T_BIGINT2 = 3.72s More or less what's expected: different number of pages are accessed, so T_BIGINT2 wins as compared to T_BIGINT. Interesting that T_INT128 has the same index size, but is noticeably slower. Due to a bigger computational overhead? (2) Narrow range scan: T_INT = 2.74s T_BIGINT = 2.79s T_INT128 = 2.83s T_BIGINT2 = 2.76s Almost the same results for both BIGINT compressions. (3) Unique scan: T_INT = 1.83s T_BIGINT = 1.89s T_INT128 = 2.09s T_BIGINT2 = 2.05s This is the worst case for T_BIGINT2, it loses ~10% to T_BIGINT. In cases (2) and (3) index size does not matter, only index depth matters. So the stats reflects CPU overhead of btr.cpp::compress() and data type conversions. It's also likely that if BIGINT is used as PK and currently its index is 4 levels depth, the new storage will shrink it down to 3 levels and this could noticeably improve both lookup and (especially) join performance. Does anyone think we should investigate this possibility more closely? Any other comments? I'd vote to remove idx_numeric2 in favour of idx_decimal and look how to improve Decimal128::makeIndexKey() performance. For example, it stores every 3 decimal digits into 10 bits using relatively complex (computationally) algorithm with shifts and multiplications. Storing every digit in 4 bits will use 2 bits per 3 digits more but should save come CPU circles, I believe. Regards, Vlad Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
15.02.2022 15:20, Dmitry Yemanov wrote: ... I can think of two ways to proceed: 1) Keep rebuilding the index every time the type is changed. Cast all column values into the new format before storing keys into the index. Forget about being scale-independent, pick the scale from the latest table format (and store BIGINTs or all exact numerics as integers without scale, as you suggest). 2) Prohibit decreasing the scale. Avoid rebuilding the indices unless really necessary (idx_itype is changed). Consider it a good thing (tm). Maybe there may be other options, I didn't think deeply about this. 2+) Rebuild index only if scale was decreased. Add optional clause at SQL syntax level to explicitly [dis]allow this. If scale was not decreased - leave index unchanged. Regards, Vlad Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
15.02.2022 14:41, Dimitry Sibiryakov wrote: Is "stored keys independent of the declared scale" a really useful and used feature? That's a separate but also good question. Double precision keys allow independence of both precision (within its range) and scale. AFAIU, the idea was to allow changing column data types between FLOAT/DOUBLE/SMALLINT/INT and between different NUMERIC/DECIMAL precisions/scales without rebuilding the index (i.e. keys in the old and new formats may co-exist). But then surprises start to appear. First of all, it does not work this way. We rebuild an index every time a column's data type is changed, regardless of the old/new precision and scale. Even if the field is altered to the same data type. And AFAIK, it's not something we changed in Firebird, the original IB6 code already worked this way. Second, co-existence of keys in the old and new formats is also buggy now, see below. recreate table t_scale (col1 numeric(5, 2), col2 numeric(5, 2)); create index t_scale_i1 on t_scale(col1); create index t_scale_i2 on t_scale(col2); commit; insert into t_scale values (1.23, 1.23); commit; select * from t_scale; COL1 COL2 1.23 1.23 select * from t_scale where col1 = 1.23; COL1 COL2 1.23 1.23 select * from t_scale where col2 = 1.23; COL1 COL2 1.23 1.23 alter table t_scale alter col1 type numeric(6, 3); alter table t_scale alter col2 type numeric(5, 1); commit; select * from t_scale; COL1 COL2 1.230 1.2 select * from t_scale where col1 = 1.23; COL1 COL2 1.230 1.2 select * from t_scale where col1 = 1.230; COL1 COL2 1.230 1.2 -- so far so good select * from t_scale where col2 = 1.2; -- NOTHING! WTF? select * from t_scale where col2 = 1.23; -- ALSO NOTHING! We cannot access the existing value via the index anymore. This happens because (even after index rebuild) the stored key belongs to the old format, i.e. 1.23. It cannot be matched while searching for 1.2. It is matched while searching for 1.23, but then the record is read, its stored value 1.23 (old format) is converted into 1.2 (new format) and then re-compared with 1.23 thus being false again. I can think of two ways to proceed: 1) Keep rebuilding the index every time the type is changed. Cast all column values into the new format before storing keys into the index. Forget about being scale-independent, pick the scale from the latest table format (and store BIGINTs or all exact numerics as integers without scale, as you suggest). 2) Prohibit decreasing the scale. Avoid rebuilding the indices unless really necessary (idx_itype is changed). Consider it a good thing (tm). Maybe there may be other options, I didn't think deeply about this. Dmitry Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
On 15/02/2022 08:28, Dmitry Yemanov wrote: > > Any other comments? > Would not it be possible to use some form of varint encoding (VLQ) there? Adriano Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] INT64 and index keys
Dmitry Yemanov wrote 15.02.2022 12:28: Does anyone think we should investigate this possibility more closely? Any other comments? Is "stored keys independent of the declared scale" a really useful and used feature? If we prohibit that, the index key can be a plain integer without rounding and false positive comparison but with perfect prefix compression. -- WBR, SD. Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel