[Imdbpy-devel] SOUNDEX and imdbpy

Martin Kirst Thu, 06 Apr 2006 13:07:11 -0700

Hello everybody!

While I was reading the last devel-mails there was something
that came to my mind about the SOUNDEX feature.


I think it is not a good idea to pre calculate all the combinations
of a movie title (person name and so on) which are possible.
Because of, If we assume 10.000 movie titles (not really counted yet)
and assume 10 combinations for each title we'll have 100.000 titles
in our database. This will also increase the size of the database
by factor 10! And don't forget all the person names.
(These are only rough calculations.)
And on my machine I already have sacrificed 1252 MegaBytes :-/
I don't want to imagine if i multiply this with factor 10 or above :-)

To me, it seems a better idea to do it this way:
0. don't pre calculate the soundex word,
   do it just the normal way like today
1. a user enters a movie/person name (in the search field)
2. imdbpyweb calculates the soundex-words, say 10 combinations
   If I'm right, each of the soundex calculated words
   does have a distinct degree of probability.
3. imdbpyweb will stress the database for each combined word.
   So we'll have the result sets in order of their degree of
   probability.
4. plot out the results in HTML in order of their degree of probability

Yes I know, that this costs a lot of computing power in the database.
But I think, this is the better way. Because of, if you search in one
table, lets say movies, the database will have its own caching system
to speed up the second select and the third one ...
Don't forget, that if you increase the entries in a database by
a factor like 10, it will take also much more time for one select.
If we implement it our in a way I think of, there is also a lot of
space for optimize the database, like providing more memory to
the mysqld (setup in myslq.ini or mysql.cnf).
And at the end, I'm sure, that it will take much more than my
Athlon 2500+ with 512MB to ensure a speed just like on www.imdb.com :-)

This is what I'm thinking about.
Whats your opinion?

I'm also open minded to to write a little benchmark,
for checking both approaches, which is the more promising one.

Greetings
 Martin


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Imdbpy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

[Imdbpy-devel] SOUNDEX and imdbpy

Reply via email to