Re: 回复：回复： Why is creating indexes faster after inserting massive data rows?

Karen Abgarian Mon, 07 May 2012 20:37:28 -0700

Honestly, I did not understand that.   I did not say anything about being 
complicated.  What does mysql not use, caching??


Judging by experience, creating a unique index on say, a 200G table could be a 
bitter one.   


On 07.05.2012, at 19:26, Zhangzhigang wrote:

> Karen...
> 
> The mysql does not use this approach what you said which is complicated.
> 
> I  agree with ohan De Meersman.
> 
> 
> ________________________________
> 发件人： Karen Abgarian <a...@apple.com>
> 收件人： mysql@lists.mysql.com 
> 发送日期： 2012年5月8日, 星期二, 上午 1:30
> 主题: Re: 回复： Why is creating indexes faster after inserting massive data rows?
> 
> Hi, 
> 
> A couple cents to this. 
> 
> There isn't really a million of block writes.   The record gets added to the 
> block, but that gets modified in OS cache if we assume MyISAM tables and in 
> the Innodb buffer if we assume InnoDB tables.   In both cases, the actual 
> writing does not take place and does not slow down the process.    What does 
> however happen for each operation, is processing the statement, locating the 
> entries to update in the index, index block splits and , for good reason, 
> committing.  
> 
> When it comes to creating an index, what needs to happen, is to read the 
> whole table and to sort all rows by the index key.   The latter process will 
> be the most determining factor in answering the original question, because 
> for the large tables the sort will have to do a lot of disk I/O.    The point 
> I am trying to make is there will be situations when creating indexes and 
> then inserting the rows will be faster than creating an index afterwards.   
> If we try to determine such situations, we could notice that the likelihood 
> of the sort going to disk increases with the amount of distinct values to be 
> sorted.   For this reason, my choice would be to create things like 
> primary/unique keys beforehand unless I am certain that everything will fit 
> in the available memory. 
> 
> Peace
> Karen
> 
> 
> 
> On May 7, 2012, at 8:05 AM, Johan De Meersman wrote:
> 
>> ----- Original Message -----
>> 
>>> From: "Zhangzhigang" <zzgang_2...@yahoo.com.cn>
>> 
>>> Ok, Creating the index *after* the inserts, the index gets created in
>>> a single operation.
>>> But the indexes has to be updating row by row after the data rows has
>>> all been inserted. Does it work in this way?
>> No, when you create an index on an existing table (like after a mass 
>> insert), what happens is that the engine does a single full tablescan and 
>> builds the index in a single pass, which is a lot more performant than 
>> updating a single disk block for every record, for the simple reason that a 
>> single disk block can contain dozens of index entries. 
>> 
>> Imagine that you insert one million rows, and you have 100 index entries in 
>> a disk block (random numbers, to make a point. Real numbers will depend on 
>> storage, file system, index, et cetera). Obviously there's no way to write 
>> less than a single block to disk - that's how it works. 
>> 
>> You can update your index for each record in turn. That means you will need 
>> to do 1 million index - and thus block - writes; plus additional reads for 
>> those blocks you don't have in memory - that's the index cache. 
>> 
>> Now, if you create a new index on an existing table, you are first of all 
>> bypassing any index read operations - there *is* no index to read, yet. Then 
>> the system is going to do a full tablescan - considered slow, but you need 
>> all the data, so there's no better way anyway. The index will be built - 
>> in-memory as much as possible - and the system will automatically prefer to 
>> write only complete blocks - 10.000 of them. That's the exact same number of 
>> index blocks, but you only write each block once, so that's only 10.000 
>> writes instead of 1.000.000. 
>> 
>> Now there's a lot more at play, things like B-tree balancing and whatnot, 
>> but that's the basic picture. 
>> 
>> -- 
>> 
>> Bier met grenadyn 
>> Is als mosterd by den wyn 
>> Sy die't drinkt, is eene kwezel 
>> Hy die't drinkt, is ras een ezel 
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Re: 回复： 回复： Why is creating indexes faster after inserting massive data rows?

Reply via email to

Re: 回复：回复： Why is creating indexes faster after inserting massive data rows?