Honestly, I did not understand that. I did not say anything about being complicated. What does mysql not use, caching??
Judging by experience, creating a unique index on say, a 200G table could be a bitter one. On 07.05.2012, at 19:26, Zhangzhigang wrote: > Karen... > > The mysql does not use this approach what you said which is complicated. > > I agree with ohan De Meersman. > > > ________________________________ > 发件人: Karen Abgarian <a...@apple.com> > 收件人: mysql@lists.mysql.com > 发送日期: 2012年5月8日, 星期二, 上午 1:30 > 主题: Re: 回复: Why is creating indexes faster after inserting massive data rows? > > Hi, > > A couple cents to this. > > There isn't really a million of block writes. The record gets added to the > block, but that gets modified in OS cache if we assume MyISAM tables and in > the Innodb buffer if we assume InnoDB tables. In both cases, the actual > writing does not take place and does not slow down the process. What does > however happen for each operation, is processing the statement, locating the > entries to update in the index, index block splits and , for good reason, > committing. > > When it comes to creating an index, what needs to happen, is to read the > whole table and to sort all rows by the index key. The latter process will > be the most determining factor in answering the original question, because > for the large tables the sort will have to do a lot of disk I/O. The point > I am trying to make is there will be situations when creating indexes and > then inserting the rows will be faster than creating an index afterwards. > If we try to determine such situations, we could notice that the likelihood > of the sort going to disk increases with the amount of distinct values to be > sorted. For this reason, my choice would be to create things like > primary/unique keys beforehand unless I am certain that everything will fit > in the available memory. > > Peace > Karen > > > > On May 7, 2012, at 8:05 AM, Johan De Meersman wrote: > >> ----- Original Message ----- >> >>> From: "Zhangzhigang" <zzgang_2...@yahoo.com.cn> >> >>> Ok, Creating the index *after* the inserts, the index gets created in >>> a single operation. >>> But the indexes has to be updating row by row after the data rows has >>> all been inserted. Does it work in this way? >> No, when you create an index on an existing table (like after a mass >> insert), what happens is that the engine does a single full tablescan and >> builds the index in a single pass, which is a lot more performant than >> updating a single disk block for every record, for the simple reason that a >> single disk block can contain dozens of index entries. >> >> Imagine that you insert one million rows, and you have 100 index entries in >> a disk block (random numbers, to make a point. Real numbers will depend on >> storage, file system, index, et cetera). Obviously there's no way to write >> less than a single block to disk - that's how it works. >> >> You can update your index for each record in turn. That means you will need >> to do 1 million index - and thus block - writes; plus additional reads for >> those blocks you don't have in memory - that's the index cache. >> >> Now, if you create a new index on an existing table, you are first of all >> bypassing any index read operations - there *is* no index to read, yet. Then >> the system is going to do a full tablescan - considered slow, but you need >> all the data, so there's no better way anyway. The index will be built - >> in-memory as much as possible - and the system will automatically prefer to >> write only complete blocks - 10.000 of them. That's the exact same number of >> index blocks, but you only write each block once, so that's only 10.000 >> writes instead of 1.000.000. >> >> Now there's a lot more at play, things like B-tree balancing and whatnot, >> but that's the basic picture. >> >> -- >> >> Bier met grenadyn >> Is als mosterd by den wyn >> Sy die't drinkt, is eene kwezel >> Hy die't drinkt, is ras een ezel > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql