On Jun 06, Giuseppe Cowo Corbelli <[EMAIL PROTECTED]> wrote: > > Critical tables are CastInfo, Name and Title. I'm not sure that > > the current indexes are the best ones. > > Let's start with these 3. What are the most common select operations > on these tables?
There are some different cases: [searching for a movie title] If the searched string looks like a "normal" title (i.e.: _not_ the title of an episode of a series) its soundex phonetic code is searched in the Title table, amongst rows with a "kindID" column set to anything but the id used to identify episodes. If the searched string seems to be an episode title: first of all the title of the series is searched as it was a non-episode title (i.e.: the soundex of the "series title portion" of the searched string is matched with the phoneticCode column in the Title table); then the title of the episode is searched (again: its soundex code is searched) restricting also the "kindID" column to be set to "episode", and the "episodeOfID" column to be amongst the previously matched series' titles. The same query is executed for the AkaTitle table. Being things not simple enough, if the user wants to exclude adult-only movies [1], these results are filtered: the MovieInfo table is searched for rows where the "movieID" column is in the list of results AND where the "infoTypeID" column is set to "genres" AND the value of the "info" field is "Adult". I think that database server are highly optimized to take advantage of indexes: I'm sure that the "info" text column is compared only for that very few rows that match the other conditions. [searching for a person name] The soundex code of the searched string (and some variations) is matched against the namePcodeCf, namePcodeNf and surnamePcode columns in the Name (and AkaName) table. As a special case, if the searched string is a single word, only the namePcodeCf is compared. [retrieving information about a movie] An important query is the one used to retrieve a row about a movie in the Title (or AkaTitle) table, but I assume that a Title.get(movieID) is _always_ fast enough. :-) The CastInfo table is searched for the given movieID; i.e.: CastInfo.select(CastInfo.q.movieID == movieID) The same is done with the MovieInfo, AkaTitle, CompleteCast and MovieLink tables. The Title table is then searched for rows where the "episodeOfID" column is set to the movieID (meaning that the retrieved movie has episodes). [retrieving information about a person] As usual, Name.get(personID) is already optimized enough. :-) The CastInfo table is searched for the personID: CastInfo.select(CastInfo.q.personID == personID) The same is done with the PersonInfo and AkaName tables. I think this is more or less everything. +++ [1] Reading the coded I think I've spotted a bug filtering out adult movies. The fact that nobody noticed this bug, tells a lot of things about the IMDbPY's user-base. :-) And I'm really proud of this! <g> -- Davide Alberani <[EMAIL PROTECTED]> [PGP KeyID: 0x465BFD47] http://erlug.linux.it/~da/ _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel