On 10-6-2012 12:32, Dmitry Yemanov wrote:
> 10.06.2012 14:09, Mark Rotteveel wrote:
>>
>> * for each record (sequential read of all pages?)
>> * insert column value into index
>
> Inserting unsorted / random values into the b-tree is known to be much
> slower than sorting the values in advance and loading them into the
> b-tree "in order".

If the trade off is requiring a large part of memory or disk space extra 
(and the additional time required to perform the sort), then I do wonder 
if the advantage is really that big.

The build of a balanced b-tree(*) is O(n log n), and the 'good' case of 
sorting is O(n log n) to O(n^2) (ideal is O(n)). Sorting before 
insertion might only be worth the effort if the sort can fully occur in 
memory. The additional overhead of writing to disk during the sort and 
reading from disk for insertion might negate most of the advantages 
against the overhead of inserting into the b-tree unsorted (ie balancing 
etc).

It might be worth to investigate this tradeoff, or just to provide an 
option to rebuild indices without using temporary sort spaces so others 
can measure it for themselves (or to have a workaround if disk space is 
not large enough to accommodate the rebuild).

*) I am assuming FB uses a balanced trees, because sorting first would 
otherwise degenerate it into a linked list
-- 
Mark Rotteveel


Reply via email to