Re: 回复： Why is creating indexes faster after inserting massive data rows?

Johan De Meersman Mon, 07 May 2012 08:06:23 -0700

----- Original Message -----

> From: "Zhangzhigang" <[email protected]>


> Ok, Creating the index *after* the inserts, the index gets created in
> a single operation.
> But the indexes has to be updating row by row after the data rows has
> all been inserted. Does it work in this way?
No, when you create an index on an existing table (like after a mass insert), 
what happens is that the engine does a single full tablescan and builds the 
index in a single pass, which is a lot more performant than updating a single 
disk block for every record, for the simple reason that a single disk block can 
contain dozens of index entries. 

Imagine that you insert one million rows, and you have 100 index entries in a 
disk block (random numbers, to make a point. Real numbers will depend on 
storage, file system, index, et cetera). Obviously there's no way to write less 
than a single block to disk - that's how it works. 

You can update your index for each record in turn. That means you will need to 
do 1 million index - and thus block - writes; plus additional reads for those 
blocks you don't have in memory - that's the index cache. 

Now, if you create a new index on an existing table, you are first of all 
bypassing any index read operations - there *is* no index to read, yet. Then 
the system is going to do a full tablescan - considered slow, but you need all 
the data, so there's no better way anyway. The index will be built - in-memory 
as much as possible - and the system will automatically prefer to write only 
complete blocks - 10.000 of them. That's the exact same number of index blocks, 
but you only write each block once, so that's only 10.000 writes instead of 
1.000.000. 

Now there's a lot more at play, things like B-tree balancing and whatnot, but 
that's the basic picture. 

-- 

Bier met grenadyn 
Is als mosterd by den wyn 
Sy die't drinkt, is eene kwezel 
Hy die't drinkt, is ras een ezel

Re: 回复： Why is creating indexes faster after inserting massive data rows?

Reply via email to