On 5/2/2014 3:25 PM, Vlad Khorsun wrote:
>> OK, the alternative to record lookups is to store the transaction id in
>> index.  This would require an index insertion for all indexes defined on
>> a table even if the key value didn't change.  It would also require a
>> corresponding index deletion for each index defined on the table when a
>> record version was garbage collected.  The update performance would go
>> straight down the toilet.  And this doesn't include the performance drop
>> due to fluffed up indexes.
>>
>> If you have a third alternative, by all means share it.
>      Mark every index key with two tx numbers:
> - first, mandatory - is the number of tx that inserts this key (insert, 
> update when key was changed),
> - second, optional - is the number of tx that deletes this index key (delete, 
> update when key was changed).
>
> - inserts will cost the same as now,
> - updates will
>      - if key was not changed - same cost as now (zero)
>      - if key was changed - twice cost as now (mark old key with current tx 
> number, insert new key)
> - delete will have additional cost to mark old key with current tx number
> - undo of update and delete must additionally clear the mark for the old key
> - index keys will be more wide than now, there is some tricks to reduce new 
> index keys length
> - garbage collection will be the main winner - there is no need to process 
> indices at the same time
>    as table records. It allows to process every index independent from table 
> and almost completely
>    eliminates random IO when records are removed. Currently, when table have 
> more than 5-6 indices,
>    garbage collection is terrible slow because of random IO.
> - selects will have no need to read record version to evaluate record 
> visibility
> - also it allows to have index coverage (also requires to use such index key 
> encoding which allows
>    to recover original value from index key)
>
>

OK, I've given this some thought.  With one reservation and a set of 
concerns, I'm reasonably convinced that the information necessary to 
implement this is available at each step of an entry entry life cycle -- 
insertion, update, and eventual garbage collection is either available 
when need or could be provided at negligible cost. The performance 
considerations are significant and very difficult to balance.  I will 
stop short at saying it could be implemented and leave whether it should 
be implement to folks more familiar with the current code than me.

My sole reservation is that there can be information lost when a large 
valued 64 integer is converted into a double to generate the index key.  
The current key generation scheme. doesn't quite hack it Vlad's scheme.  
That said, it could be readily fixed by appending the low order bits of 
the original 64 bit integer to the end of the mulched key.  This could 
probably be done as an upwards compatible upgrade, but since Vlad's 
scheme requires other significant changes, it probably doesn't make sense.

My first concern is over index density.  Adding one and sometimes two 
transaction ids to each index entry is going to significantly fluff up 
the index, increasing the I/O necessary for index scans and 
maintenance.  Transaction ids are not small and don't compress well.  
Given the density of the current index structure, adding transaction ids 
might double the index size.

My second concern is CPU performance.  Index bucket traversal has always 
been the single greatest hot spot in the architecture. Arno's jump 
table, for example, was a net success by reducing CPU at the expense of 
index density.  The CPU cost of skip over a (variable length?) 
transaction id, determine whether there is a second transaction id, and 
is so, skip over the second could easily double or triple the cost of 
index lookups.  But perhaps a beastly clever compression schema that 
compressed the pair as a single unit could mitigate this.  But either 
way, the addition cycles will be significant -- and probably painful.

My third concern is that garbage collection will be very tricky, but as 
lone as the going/staying lists are maintained, doable.  The simple 
cases are probably straightforward, but untangling A-B-A updates are 
going to be hard, particularly if the two A entries are split between 
two index leaf pages.  Still, this is only code, and I wouldn't expect a 
noticeable perform hit.

My last concern doesn't really apply to single node Firebird, but could 
come back and bite if somebody tried to make a distributed version.  The 
problem is that Sean's scheme assumes that record versions and 
transaction ids are co-linear.   This works when allocating transaction 
ids from the header page, but this is expensive in fine grain threading 
and a disaster in the distributed case.   In NuoDB, for example, I have 
formal IdManagers that delegate blocks of ids for individual nodes.  
Vlad's scheme could not be implement on systems like NuoDB.  That's 
hardly fatal for Firebird, but it does make a future growth path 
troublesome.

There is much to be said for an index scan mechanism that does not 
require record lookup.  This has been an issue since the DEC database 
war between Rdb/ELN and Rdb/VMS.  There are gobs of benchmarks out there 
that take advantage of index-only retrievals, and some operations like 
computing counts are vastly faster.  But there is no free lunch, and 
lots of other operations will get slower.

It's an excellent idea that, I think, could be made to work.  I just 
don't know whether it is wise or not.


------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to