> This works, though I am not getting all rows returned and I haven't quite
> figured why yet. I hate to rely on the titles being exactly the same. I
> have data where one title contains the words DVD while the VHS version
> doesn't.
Unfortunately, you are running into a fundamental data identity issue here.
Here, identity is "a particular movie, regardless of format". The goal of
your application is to return a list of identities (as defined above), with
available format information included.
If you had some unique identifier (similar to ISBN) for movies, something
that said 002943018 = "What's Eating Gilbert Grape?" regardless of whether
it was VHS, Beta, DVD, Laserdisc, or celluloid, you could do this comparison
flawlessly and easily. Unfortunately, I am guessing that you have no such
unique identifier, and further that no such identifier even exists.
(Interesting idea: use IMDBs data. They identify "What's Eating Gilbert
Grape" as tt0108550. If you used this, you could even dynamically generate
links to IMDB for more information. You would have to create a table of
titles and these unique IDs somehow, and this would be a bunch of work,
but--as I said--interesting idea. If you link to them, they might not even
mind, in which case you could ask them to send you a list of titles and IDs.
Again, just a brainstorm.)
Back to the issue at hand: if you had such an identifier, you would be all
set. But you don't. So, you are using title, which is pretty good. But
you have problems. "The Manchurian Candidate" is not good as an identity:
it could mean either the 1962 or the 2004 version. Also, you have this
format data appended.
I recommend that you add a column to the table that is "cleaned title". You
then cycle through the database and remove that VHS and DVD note. While you
do it, you could create a mediaformat column as suggested before. You could
leave the original title as is, if you want, or go to the cleaned title. I
see no way around doing at least this.
Of course, once you have done that, the question is why you don't have a
relational database with one table that holds videos where each row is a
unique identity, and another table that holds stock, indicating, among other
things, what media format the stock item is.
If you don't split the table like this, I would still recommend that you
create some sort of internal unique ID that you use to identify a movie
entity. Others might know more on this issue, but my guess is that
comparing titles (even after the cleaning described above) will be slower
than comparing numeric values, thereby leading to faster joins.
HTH,
Matthieu
[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings] [Donations and Support]

