There are 3 major factors determining your performance: 1) the speed of the CPU, 2) the size if your RAM, and 3) the data transfer speed of your disks.
Since most computers do not let you change your CPU on the fly I will assume that for some process A it will require the same number of clock cycles to execute whether your data is in memory or on your disk. Being a constant, we will leave it out for a moment. RAM is much faster than disk for storage and retrieval. Thus, if you have more RAM, you will generally have better performance. That leave DISK I/O as the factor you are trying to change in order to improve your performance, right? Let's think through several scenarios and figure out why what you did made only a small difference to your performance in the tests you ran. TEST 1) Random read of 100000 rows Whenever the database engine has to find a value in a database it has only two choices. It can find a value in an index or it can perform a table scan to find the value. Since I assume you used your primary key as your search value for both tables then both lookups were resolved by index search. MySQL uses binary tree indexes. That means for any search in an index of size X you will have to perform roughly N=(log-base-2( X) ) comparisons to find your value. For an index containing 1.5 Million items it would take (log-base-2(1.5e6)) or about 21 comparisons to locate your value in the index. For an index that contains 16 million we get the number N=(log-base-2(1.6e7)) or about 24 comparisons to find your number. How much longer did you think the search engine would need to do those 3 extra comparisons? What takes the longest is transferring all of the data for the index into RAM from the disk then moving the read heads to the correct location to retrieve your record. The actual lookup is blazingly fast. 16 million integers (assuming 32 bits per integer + 8 bytes for record pointers) is only 192 MB so the entire list fits nicely into most server's memory. TEST 2) Insert 100000 rows of new data This performance will vary by database engine you are using (MyISAM or InnoDB) as does the storage method. Suffice it to say that by the time the disk has finished physically writing 100000 rows of data to the media, the CPU has already updated your index and is waiting on a chance to commit its cache to disk. Both engines can grow the end of their indices and data storage on disk efficiently so dumping new data into a singly-indexed table is fast. Since all of the data was sent to the end of the dataspace, there was no need to split data pages to insert rows mid-space. What can kill INSERT performance is if there are several indices on a table. Each new record will have to be added to each index causing a nearly linear slowdown in INSERT performance. What splitting your data into smaller tables WILL help with is: 1) you are more fault tolerant. The failure of one 10% "block" of your data will not affect in the availability of the other 90%. 2) you can move parts of your data to other disks. Using multiple I/O channels should be faster than a single channel. 3) it will be easier to backup and archive your database. You can move parts of your data to "near-line" storage if it is old and not frequently needed. 4) You can apply different indexing methods to your most active data blocks than you need for your older records, improving your response time. You save total disk space by adding indexes to just the most active portions of your data. 5) If your application is smart enough, you can split your data over several servers. This would be a SERIOUS performance increase. 6) Write-locking a table only locks up a portion of your data (MyISAM). This improves concurrency and consistency. Negative things about splitting your data: 1) You will have to code more MERGE tables, UNION queries, and table JOINS depending your circumstances. This is a negative due to the increased development time and maintenance complexity. 2) Certain aggregate queries will be much less efficient or unavailable. 3) Requires an external partitioning processes to decide which table should receive new data. Depending on your application, you may be able to gain significant increases in performance by splitting your tables. I hope I was able to shed some light on why I think your tests didn't show much difference between the two storage schemes. Yours, Shawn Green Database Administrator Unimin Corporation - Spruce Pine "Leonardo Francalanci" To: "Mysql" <[EMAIL PROTECTED]> <[EMAIL PROTECTED] cc: tel.ie> Fax to: Subject: R: why CPU is high while disks are idle in a table scan??? 06/21/2004 01:20 PM Sorry, now it runs as espected (but I don't know what happened, maybe doing other queries I emptied out the cache) Thank you anyway for your replies. And: could somebody answer my previous question? I wrote: I have to deal with pretty big data, such a table (say BIG_TABLE) with 16.000.000 of rows. The table has just one primary index (a simple int). I thought that splitting data among multiple tables would have boosted performance, so I put my data in 16 different tables (everything is on one disk). The first table had IDs from 1 to 1.500.000, the second the range 1500001-3000000 and so on. When I made performance tests, I found that I didn't get any performance gain. So I tried (just for test purpose) to create a table of 100000 rows (say SMALL_TABLE) with some data (IDs from 1.000.000 to 1.100.000) from my big table BIG_TABLE. Tests were like follows: (pseudocode) for 100000 times read a column from BIG_TABLE where ID=<random number between 1.000.000 and 1.100.000> for 100000 times read a column from SMALL_TABLE where ID=<random number between 1.000.000 and 1.100.000> The results are the same! There is no noticeable difference between a table with 16M rows and a table with 100K rows! Is that because my IDs are simple int from 1 to n (without gaps)? I've tried to do 100000 inserts in both tables, and BIG_TABLE takes 45 seconds while SMALL_TABLE takes 43 secs (not such a big difference for a table 160 times smaller!) The only big performance gain I got is from a SUM on a big range, because MySql in this case does a full scan on the SMALL_TABLE which is faster than an access by index on the BIG_TABLE. Am I doing something wrong? Isn't splitting supposed to improve performance? Or it is true only if you can split data on multiple disks? Thank you Leonardo -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]