I don't think that he is worried about table scanning, he is worried about 
ROW scanning.  Each of his rows is so large (2500*(size of float) + 
3*(size of tinyint) + some other stuff) that just moving that much data 
around through his machine is consuming too much time. 

If you have a query that does something like this

SELECT onefield
FROM hugetable
WHERE indexed_column = 12
        AND non_indexed = 6

Odds are that you will be doing an indexed search. Say that for his 
indexed_column he gets 24M rows that match its condition out of the 16B 
rows on his table (this is way less than 30%). The next thing the engine 
has to do is to LOAD each of those 24M rows (every single column) into 
memory so that the value of the non_indexed column can be compared to 6. 
In order to perform that second comparison, the memory bus, the hard 
drives, and anything else related to reading records will have to transfer 
275.4 GB of data (at least once) just so that he can get the value from 
the 1 column he specified in his SELECT statement out of each row that 
matches his two WHERE conditions.

My idea is to create a way to ask the engine check the value of the second 
field directly from the DISK copy of each table (without moving each row 
into local memory) . If it matches a second seek is performed to pull in 
the field(s) specified in his select .  Alternatively we could create some 
way that we can ask the engine to only pull those columns through memory 
that either participate in the SELECT clause or one of the other ON or 
WHERE conditions.  This way we minimize how much data must be moved 
through memory to resolve one of these rather bulky queries.

We could potentially create a new threshold value, say... if you use less 
than 10% of the fields on a table in a query , that would cause this 
optimization to kick in.

If anyone else has any ideas (and I am sure there are many) on how to 
minimize disk traffic and memory throughput in order to handle his rather 
large dataset, this would be a great time to speak up. I know that this is 
a rather extreme case but solving this issue may make MySQL just that much 
faster for the rest of us.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



"Dathan Vance Pattishall" <[EMAIL PROTECTED]> wrote on 10/21/2004 
04:00:10 PM:

> 
> 
> DVP
> ----
> Dathan Vance Pattishall     http://www.friendster.com
> 
> 
> > -----Original Message-----
> > 
> > So, is there a faster way to insert/index the data?  Would a different
> > table or
> > index type improve performace?
> 
> Use Load data from infile .. IGNORE ...  u might get a better insert 
speed
> increase. A different table and a different index / compound index would
> improve performance from an insert to a select perspective.
> 
> 
> 
> > 
> > above query, according to explain), the current behavior makes it 
reads
> > 29548800 complete rows, which is 275.4 GB of data to read, even though 
the
> > desired return is about 1/2500th of that (112.7 MB).
> 
> If a range covers more then 30% of the table a table scan is performed,
> instead of an index scan.
> 
> 
> > 
> > 
> > Any/all suggestions, comments, even flames are welcoome :)  Thanks in
> > advance!
> > 
> > ken
> 
> Look at some my.cnf options. You can tell mysql to use keys more often 
the
> table scans with a var called max_seeks_keys=100 // something like that
> 
> 
> 
> > 
==========================================================================
> > =
> >               "Diplomacy is the weapon of the Civilized Warrior"
> >                                     - Hun, A.T.
> > 
> > Ken Gieselman
> > [EMAIL PROTECTED]
> > System Administrator
> > http://www.endlessknot.com/~kgieselm
> > Endlessknot Communications
> > http://www.endlessknot.com
> > 
==========================================================================
> > =
> > 
> > 
> > 
> > 
> > 
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:
> > http://lists.mysql.com/[EMAIL PROTECTED]
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
> 

Reply via email to