On 5/2/2014 3:25 PM, Vlad Khorsun wrote: >> OK, the alternative to record lookups is to store the transaction id in >> index. This would require an index insertion for all indexes defined on >> a table even if the key value didn't change. It would also require a >> corresponding index deletion for each index defined on the table when a >> record version was garbage collected. The update performance would go >> straight down the toilet. And this doesn't include the performance drop >> due to fluffed up indexes. >> >> If you have a third alternative, by all means share it. > Mark every index key with two tx numbers: > - first, mandatory - is the number of tx that inserts this key (insert, > update when key was changed), > - second, optional - is the number of tx that deletes this index key (delete, > update when key was changed). > > - inserts will cost the same as now, > - updates will > - if key was not changed - same cost as now (zero) > - if key was changed - twice cost as now (mark old key with current tx > number, insert new key) > - delete will have additional cost to mark old key with current tx number > - undo of update and delete must additionally clear the mark for the old key > - index keys will be more wide than now, there is some tricks to reduce new > index keys length > - garbage collection will be the main winner - there is no need to process > indices at the same time > as table records. It allows to process every index independent from table > and almost completely > eliminates random IO when records are removed. Currently, when table have > more than 5-6 indices, > garbage collection is terrible slow because of random IO. > - selects will have no need to read record version to evaluate record > visibility > - also it allows to have index coverage (also requires to use such index key > encoding which allows > to recover original value from index key) > >
OK, I've given this some thought. With one reservation and a set of concerns, I'm reasonably convinced that the information necessary to implement this is available at each step of an entry entry life cycle -- insertion, update, and eventual garbage collection is either available when need or could be provided at negligible cost. The performance considerations are significant and very difficult to balance. I will stop short at saying it could be implemented and leave whether it should be implement to folks more familiar with the current code than me. My sole reservation is that there can be information lost when a large valued 64 integer is converted into a double to generate the index key. The current key generation scheme. doesn't quite hack it Vlad's scheme. That said, it could be readily fixed by appending the low order bits of the original 64 bit integer to the end of the mulched key. This could probably be done as an upwards compatible upgrade, but since Vlad's scheme requires other significant changes, it probably doesn't make sense. My first concern is over index density. Adding one and sometimes two transaction ids to each index entry is going to significantly fluff up the index, increasing the I/O necessary for index scans and maintenance. Transaction ids are not small and don't compress well. Given the density of the current index structure, adding transaction ids might double the index size. My second concern is CPU performance. Index bucket traversal has always been the single greatest hot spot in the architecture. Arno's jump table, for example, was a net success by reducing CPU at the expense of index density. The CPU cost of skip over a (variable length?) transaction id, determine whether there is a second transaction id, and is so, skip over the second could easily double or triple the cost of index lookups. But perhaps a beastly clever compression schema that compressed the pair as a single unit could mitigate this. But either way, the addition cycles will be significant -- and probably painful. My third concern is that garbage collection will be very tricky, but as lone as the going/staying lists are maintained, doable. The simple cases are probably straightforward, but untangling A-B-A updates are going to be hard, particularly if the two A entries are split between two index leaf pages. Still, this is only code, and I wouldn't expect a noticeable perform hit. My last concern doesn't really apply to single node Firebird, but could come back and bite if somebody tried to make a distributed version. The problem is that Sean's scheme assumes that record versions and transaction ids are co-linear. This works when allocating transaction ids from the header page, but this is expensive in fine grain threading and a disaster in the distributed case. In NuoDB, for example, I have formal IdManagers that delegate blocks of ids for individual nodes. Vlad's scheme could not be implement on systems like NuoDB. That's hardly fatal for Firebird, but it does make a future growth path troublesome. There is much to be said for an index scan mechanism that does not require record lookup. This has been an issue since the DEC database war between Rdb/ELN and Rdb/VMS. There are gobs of benchmarks out there that take advantage of index-only retrievals, and some operations like computing counts are vastly faster. But there is no free lunch, and lots of other operations will get slower. It's an excellent idea that, I think, could be made to work. I just don't know whether it is wise or not. ------------------------------------------------------------------------------ Is your legacy SCM system holding you back? Join Perforce May 7 to find out: • 3 signs your SCM is hindering your productivity • Requirements for releasing software faster • Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel