On 2015-07-28 09:25 PM, rotaiv wrote: > Not quite. The older version brought it down to 8 seconds (as compared to > 5 seconds) but still a whole lot better than 40+ minutes. > > It is very interesting to see an index can make such a remarkable > difference in that particular scenario. This is my first time creating > indexes but I will definitely be doing that more in the future. > > From what I read, it was recommended to create indexes for any field used > in the SELECT statement. Is that correct?
If you realize how an Index makes things faster, the question above will become easy to answer. You should learn more from web sources, but to get you rolling, here is the very condensed "Indexing 101" course: TLDR: Some very basic Index info - please skip if you know Indexing well - or - if you are well versed, please note any ideas or caveats I may have missed. I hope you have used one of the old type telephone books - the ones the phone company would send out with a long list with everyone's number in an area. It's rather easy to use if you know the person's full name and family name/surname because it is listed alphabetically by surname and shows the first name, address etc. Imagine for a moment though, your boss for some nefarious reason tasks you with getting him a list of all the people in the phone book with the name "Jeremy". There is no real quick way to do this, you have to start at the start and read every line in the book seeing if perhaps that person is a Jeremy. It will take ages. (More than 40 minutes!) Now imagine some friendly person in the office sneaks you a copy of a little black book they have made before, in this book is a list of all the first names in alphabetical order along with on which page and line in the phone-book they appear. You look into it, go straight to J, look for Jeremy, you see 12 Jeremy's and can quickly find them in the large phone-book and note their full details. You finish the task in under 5 minutes and impress the boss. This little black book is basically an Index and it makes every search faster IF you were looking for first names (The thing you were "looking for" I will refer to as a "search-term"). So now you have the Surname-ordered phone-book (we say the Surname is a Key of the phone-book, and the little black book is an Index on Name). Note that Surname cannot be a Primary Key because it won't be unique. The only value that will probably be completely Unique for every entry in the book is the Phone-Number itself, so this might be used in the Primary-Key, but isn't important to us when planning search-able Index creation, because it will rarely be a search-term). A typical Schema might look something like this: CREATE TABLE PhoneBook( Surname, Name, City, Suburb, Street, ApartmentNo, PhoneNo, PRIMARY KEY (Surname, PhoneNo) ) and for the little black book: INDEX ON PhoneBook(Name) Your boss is however not satisfied that you have suffered enough and so tasks you with getting a list of people whom all live in "Church" street - and warns you that he will from now on need street-name related lists often. This puts you back to square one, but you immediately realize that another little black book which listed all the street-names in alphabetical order and then linked to where in the phone-book they appear, would make your life much easier. You need an Index on Street, compiles the little black book list (which takes a long time) but afterward your life is much easier. INDEX ON PhoneBook(Street) One more complication happens - your boss reads your first result set, and says "no no, I don't want people from Church street in Downtown, I only want the people from Church street in Chinatown." The next step is to note in your little black book not just the street name, but also the Suburb - a combined Index. Now you are faced with a choice, you can list the names as Suburb,Street - or - Street,Suburb. Which to use? Well, turns out this depends on how you will do the most searching. If you will always have a Street-name to search for and sometimes need to limit it to a Suburb or two, then this is better: INDEX ON PhoneBook(Street, Suburb) The Index above is also better if you will always have both a Street-name and a Suburb to search for because of the fact that there are many more Streets than Suburbs and the same Suburb will repeat a lot more than the Street - this is called Cardinality. We say the Suburb has low Cardinality because it repeats a lot and so there are very few Unique Suburb names. Street names have a higher Cardinality (more Unique names) and full addresses have very high Cardinality (Almost every person lives in a unique home/apartment). Notice however that if ALL you get to search on is a Suburb, the Index above is pretty useless and will result in a full Index-scan which the query optimizer will probably forgo and cause a full Table-scan. (reading through the entire PhoneBook again is easier than reading an entire Index AND having to look up the PhoneBook entry for every item in the Index.) If then sometimes you will have just the Suburb to search for, another little black book is needed: INDEX ON PhoneBook(Suburb) Another thing to note is that - You may think "An Index on Suburb is such a waste due to low Cardinality, how about I add the StreetName to this Index?" to make: INDEX ON PhoneBook(Suburb, Street) It's a great idea, but because we KNOW the cardinality of Street names will always be much higher than Suburbs, we would probably always grab the previous little black book when we have both Street and Suburb as search terms. However, in a next project, where the data isn't clear on Cardinality and we don't know before-hand, the best is to make both the Indices (or "Indexes" if you will) and let whomever will be doing the searching decide which little black book they need to use. Note however that there is a performance hit for having many Indices when Updating or Inserting - because simply the DB engine has to go through all the little black books and note the new information and sometimes reorganize and append structures to do so. The general rule-of-thumb is if you have a DB that will have many many Updates/Inserts and fewer searches (like a logger): use the least possible Indices. If you have a DB that will very rarely be updated but support a LOT of searches (like a Phone-book): use many. For everything in-between, the best results will probably need a bit of experimentation. In most SQL engines there is a function like ANALYZE which basically goes through the table and, for every Index that you have made, notes the Cardinality of the components so that the Query optimizer can better decide which is the best to use for any given set of search-terms. One final note: Imagine your first little black book had not only the Names of all the people in the Phone-book, but also included the rest of the detail: INDEX ON PhoneBook(Name, Surname, Street, Suburb, City, ApartmentNo, PhoneNo) then you would never need to even consult the Phone-book again, you can get all you need right from the Index! This is referred to as a "Covering Index" and quite a useful tool to make searching very fast. Note the order of the Indexed fields is simply what I feel are most-likely-to-be-searched-for (on the left) down to least-likely-to-be-searched-for. Another useful Covering-Index might be: INDEX ON PhoneBook(Surname, Name, Suburb, Street, City, ApartmentNo, PhoneNo) If you know what will be the likely search terms, then a couple or more covering Indices will make searches very fast - but beware: they are really expensive to update, eats a lot of disk space, and adding really large data fields to them should be avoided. > > SELECT home.fpath > FROM home > LEFT JOIN work ON work.fpath = home.fpath > WHERE work.fpath IS NULL > AND home.ftype = 'f'; > > What indexes should I create? Seems like I only need an index on > home.fpath and work.fpath but I wanted to make sure. Yes, these are all you need: INDEX ON home(fpath); INDEX ON work(fpath); Or better yet, if your schema is the same for both tables, something like this: CREATE TABLE %1 ( fpath TEXT, ffile TEXT, ftype TEXT PRIMARY KEY (fpath, ffile) );