At 5:32a -0400 on Fri, 16 May 2008, Ananda Kumar wrote:
>>> 2 Index would do range scan, which would be comparitivly slower.
>>
>> Not exactly.  This, again, depends on implementation and the
>> data against which queries are run.  An index is an index.
>> Remember, a primary key is implemented by the MySQL engine as
>> a unique not null b-tree index.  In fact, trying to get
>> low-cardinality data from a table via indexes could even be
>> faster than the primary key.  Consider a table with 1,000,001
>> rows of favorite colors of people.  Let's say 1 million
>> people like the color blue, but only 1 person who likes the
>> color puce. An index for that single person would be a huge
>> win.  Depending on the DB (read: implementation) one could make
>> this particular style of skewed data even faster.  Postgres,
>> for example, has partial-indexes.
> 
> As stated by you as above. Here puce is a clear case of just one
> record being read, which is very much similar to reading a record
> with a PRIMARY KEY, which is supposed to be fast for HIGH
> CARDINALITY, if there were 1/4 million people who liked puce, then
> the index would a RANGE SCAN and would be slow.

Eh, again, it's hard to say.  In my contrived example, puce is a clear
case of an index win via the fact that it's unique, but it's *not* like
the primary key because every other key isn't unique.  A primary key
enforces high-cardinality, so selecting individual rows is ostensibly
fast, but now lets select a range of rows.  What if we made ten people
like the color red and SELECTed them?  How about 2,000 green?  100,000
yellow?  An index perusal would still weed out >= 900,000 rows, so is it
a win?  Try it an find out.  I don't know the answer.

Getting data from the index isn't free.  It /can/ be a win when it helps
remove a statistically strong portion of the data, dependent on disk
speed, cpu speed, and a data cardinality.  Sometimes, depending on the
what rows and data it takes to satisfy a query, it's actually better to
ignore the indexes and do a range scan of the table.

The point is that it isn't as simple as snap-your-fingers,-you're-done,
which I gather we both understand.  I suggest again to the OP to do some
tests on /personal data sets/ and see what answers needs.

Besides, who likes puce anyway?  ;-)

Kevin

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to