Re: General Questions About Indexes

Baron Schwartz Mon, 28 May 2007 08:37:09 -0700

Hi John,

John Kebbel wrote:

INDEXES - A Science AND an Art


I've been continuing to look for answers to my own questions. I've found
a few ...

I meant to write back and try to help, but got busy with other things.You have found some good answers for yourself.

Q1. What good does it do to store the primary key or a unique key if
you're normally SELECTing columns that don't use that primary or unique
key?
        "As you can see, it only makes sense to index those fields you
        use in the WHERE clause."
        
http://www.databasejournal.com/features/mysql/article.php/10897_1382791_2

Generally true, but there can be some advantages to indexing othercolumns too. A common case is a covering index. For example, "SELECTcol1 FROM tbl WHERE col2=11". If you have an index on col2, it willhelp the RDBMS find entries quickly. This process works as you wouldexpect: it looks in the index B-tree for entries with value "11". Nowit has to do what's called a "bookmark lookup," to find the actual rowin the table, and retrieve col1 from it.

However, if you have an index on (col2, col1) you can retrieve the valuefrom the index, without needing to go look for the row in the table.This index "covers" the query. Order of columns is important here.

In practice, this can be an enormous advantage. I wrote an articleabout this, with possibly helpful diagrams, here:http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/

Q2. Does a SELECT statement look at an index before it looks at a

table?"Before we repair the table structure above, let me tell you

        about a most important little secret for anyone serious about
        optimizing their queries: EXPLAIN. EXPLAIN shows (explains!) how
        your queries are being used. By putting it before a SELECT, you
        can see whether indexes are being used properly, and what kind
        of join is being performed..."
        
http://www.databasejournal.com/features/mysql/article.php/10897_1382791_1

Yes, and it goes further than that; the server maintains statisticsabout the indexes, and uses them to estimate the cost of various queryexecution plans. You could spend months learning how this works, allthe fine points of various optimization strategies, etc. (Not that I'man expert on it, I just know there's a lot to it).

Q3. Are JOINs where the real timesaving occurs and SELECTs just a
peripheral issue muddying the water?
        In MySQL, Paul DuBois writes: "Index columns that you search
        for, not columns you select ... [t]he best candidate columns for
        indexing are the columns that appear in your WHERE clause or
        columns named in join clauses."

Many kinds of operations may benefit from indexes. Besides SELECT andJOIN, they can be used for GROUP BY, ORDER BY, DISTINCT, finding rows toUPDATE or DELETE, and are necessary for lots of other things likeforeign keys.

The trade-off is cost and space to maintain them. When you change data,the indexes have to be updated too.

Q4. What about non-unique indexes? Is the structure of a non-unique
index file similar to the index in the back of a book, the phrase you're
searching for plus a list of row numbers (page numbers for a book) where
that phrase is found?
        I haven't found the answer to this question, but I did find:
        "Indexes work best for columns with unique values,  and most
        poorly with columns that have many duplicate values" Paul

DuBois, MySQL


I think this might be two questions.

I'm not sure -- there may be more subtleties than this -- but I think aUNIQUE index is like any other index, except the server knows to checkfor duplicate values and throw an error. Most indexes in MySQL areB-trees, though there are specialized indexes for some things (hash,spatial, RTREE, fulltext).

Another thing you're touching on here is the selectivity of the index.This is the degree to which a row in the table is uniquely identified byan index entry. If you index a column with all one value, theselectivity is terrible; the opposite end of the spectrum is a UNIQUEindex, which has perfect selectivity (each row is uniquely identified byan index entry).

Q5. Is an item in an index tied to a memory address (like a pointer in C
++) where the indexed data appears inside the larger memory area staked
out by the table?

This is implementation-dependent. Hopefully the diagrams in the articleI linked above will help explain. There is always some kind of"pointer" from the index to the row, but how it works is different fromone system to the next. (There are some "exotic" kinds of storageengines that may not have conventional indexes, and will work completelydifferently, but the "pointer" idea applies well to all theMySQL-supplied engines, as far as I know).

Q6. As for memory, when you choose a database inside the mysql client,
are all the tables within that database read into memory from the hard
drive, or just the indexes?

I'm not sure I know what you mean by "choose a database," but if youmean the USE <database> statement, neither happens. By default theclient does examine the names of tables and columns to help withauto-completion, but this is just data *about* the tables, not the dataactually contained within the tables.

The server reads and caches various parts of indexes and tablesdepending on the storage engine, and sometimes relies on the OS to cacheparts of it too. It depends a lot on how the server is configured, butit's very possible that at least parts of the indexes are cached inmemory, and this cache is shared across connections, so is not reallyaffected by the USE statement. It is less likely (in general) that thetables are cached. Many times the data is much bigger than the server'smemory.

I guess you probably have a lot of reading already, but Pro MySQL mightslake your thirst for many of the finer details of how things workinternally.


I hope this helps.

Baron

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: General Questions About Indexes

Reply via email to