I don't think that he is worried about table scanning, he is worried about ROW scanning. Each of his rows is so large (2500*(size of float) + 3*(size of tinyint) + some other stuff) that just moving that much data around through his machine is consuming too much time.
If you have a query that does something like this SELECT onefield FROM hugetable WHERE indexed_column = 12 AND non_indexed = 6 Odds are that you will be doing an indexed search. Say that for his indexed_column he gets 24M rows that match its condition out of the 16B rows on his table (this is way less than 30%). The next thing the engine has to do is to LOAD each of those 24M rows (every single column) into memory so that the value of the non_indexed column can be compared to 6. In order to perform that second comparison, the memory bus, the hard drives, and anything else related to reading records will have to transfer 275.4 GB of data (at least once) just so that he can get the value from the 1 column he specified in his SELECT statement out of each row that matches his two WHERE conditions. My idea is to create a way to ask the engine check the value of the second field directly from the DISK copy of each table (without moving each row into local memory) . If it matches a second seek is performed to pull in the field(s) specified in his select . Alternatively we could create some way that we can ask the engine to only pull those columns through memory that either participate in the SELECT clause or one of the other ON or WHERE conditions. This way we minimize how much data must be moved through memory to resolve one of these rather bulky queries. We could potentially create a new threshold value, say... if you use less than 10% of the fields on a table in a query , that would cause this optimization to kick in. If anyone else has any ideas (and I am sure there are many) on how to minimize disk traffic and memory throughput in order to handle his rather large dataset, this would be a great time to speak up. I know that this is a rather extreme case but solving this issue may make MySQL just that much faster for the rest of us. Shawn Green Database Administrator Unimin Corporation - Spruce Pine "Dathan Vance Pattishall" <[EMAIL PROTECTED]> wrote on 10/21/2004 04:00:10 PM: > > > DVP > ---- > Dathan Vance Pattishall http://www.friendster.com > > > > -----Original Message----- > > > > So, is there a faster way to insert/index the data? Would a different > > table or > > index type improve performace? > > Use Load data from infile .. IGNORE ... u might get a better insert speed > increase. A different table and a different index / compound index would > improve performance from an insert to a select perspective. > > > > > > > above query, according to explain), the current behavior makes it reads > > 29548800 complete rows, which is 275.4 GB of data to read, even though the > > desired return is about 1/2500th of that (112.7 MB). > > If a range covers more then 30% of the table a table scan is performed, > instead of an index scan. > > > > > > > > Any/all suggestions, comments, even flames are welcoome :) Thanks in > > advance! > > > > ken > > Look at some my.cnf options. You can tell mysql to use keys more often the > table scans with a var called max_seeks_keys=100 // something like that > > > > > ========================================================================== > > = > > "Diplomacy is the weapon of the Civilized Warrior" > > - Hun, A.T. > > > > Ken Gieselman > > [EMAIL PROTECTED] > > System Administrator > > http://www.endlessknot.com/~kgieselm > > Endlessknot Communications > > http://www.endlessknot.com > > ========================================================================== > > = > > > > > > > > > > > > -- > > MySQL General Mailing List > > For list archives: http://lists.mysql.com/mysql > > To unsubscribe: > > http://lists.mysql.com/[EMAIL PROTECTED] > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] >