Hello everybody! While I was reading the last devel-mails there was something that came to my mind about the SOUNDEX feature.
I think it is not a good idea to pre calculate all the combinations of a movie title (person name and so on) which are possible. Because of, If we assume 10.000 movie titles (not really counted yet) and assume 10 combinations for each title we'll have 100.000 titles in our database. This will also increase the size of the database by factor 10! And don't forget all the person names. (These are only rough calculations.) And on my machine I already have sacrificed 1252 MegaBytes :-/ I don't want to imagine if i multiply this with factor 10 or above :-) To me, it seems a better idea to do it this way: 0. don't pre calculate the soundex word, do it just the normal way like today 1. a user enters a movie/person name (in the search field) 2. imdbpyweb calculates the soundex-words, say 10 combinations If I'm right, each of the soundex calculated words does have a distinct degree of probability. 3. imdbpyweb will stress the database for each combined word. So we'll have the result sets in order of their degree of probability. 4. plot out the results in HTML in order of their degree of probability Yes I know, that this costs a lot of computing power in the database. But I think, this is the better way. Because of, if you search in one table, lets say movies, the database will have its own caching system to speed up the second select and the third one ... Don't forget, that if you increase the entries in a database by a factor like 10, it will take also much more time for one select. If we implement it our in a way I think of, there is also a lot of space for optimize the database, like providing more memory to the mysqld (setup in myslq.ini or mysql.cnf). And at the end, I'm sure, that it will take much more than my Athlon 2500+ with 512MB to ensure a speed just like on www.imdb.com :-) This is what I'm thinking about. Whats your opinion? I'm also open minded to to write a little benchmark, for checking both approaches, which is the more promising one. Greetings Martin ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel