Hi all, this is My Evil Master Plan(tm) about the new format for movie titles, adopted by IMDb on both the web and the plain text data files. If you haven't noticed, now everything is "The Title" and the old "Title, The" is gone.
In short: [general] - start using the new format internally. Now movie.data['title'] is like "Title, The", converted when a user accesses movie['title']. In the future movie['canonical title'] and friends will require a conversion, instead. - remind to myself: functions involved in "Exact Title Searches" must be checked. [http/mobile] - a lot (all?) of data from the web was already in the new format, being more readable. Actually it's converted before the creation of a Movie instance; this must change (more on this later). [sql] - the title in the database will use the new format too; this means we should be aware of this when the data is retrieved. - the main problem here is that we also need to handle users' searches. At insert-time, we need to check that the title variations (used to compute a set of soundex values) are correct. Specular changes will be needed at retrieve time. The unfunny part is that the cutils.c module will requires fixes, too. I already have a royal headache. :-/ [local] - this is probably the last nail on the local's coffin. Not a big deal, and it will probably stay here for the next release (and removed later: remember that some portions of the code are in common between 'local' and 'sql'). The key to everything are the imdb.utils.analyze_title and imdb.utils.build_title fuctions. analyze_title (from a string to a dictionary) takes the 'canonical' argument, default (more or less) to False; when True it _first_ convert the string in the old format. This behavior is used a lot in 'http'. build_title (from a dict to a string) has a 'canonical' argument too: when True (default False) the old format is returned. For a moment I had the itch to invert the logic of the 'canonical' argument (from "should I convert it to" to "is the input in"), but this is a change at API level, and... [1] [test-suite] I've introduced a new test to check that the movie['title'] is in the new format. Actually it works, but when we'll remove the current transformation between the internal (movie.data['title']) format and the 'The Title' one, it will fail spectacuraly. :-) It can be used by itself, with: python ./test_parser.py -t -M -H -X 2>&1 | less As usual, I change my mind about 6 times a day on every subject, and so nothing is written in the stone. :-) +++ [1] not a major change (the returned type won't change), but... -- Davide Alberani <davide.alber...@gmail.com> [PGP KeyID: 0x465BFD47] http://erlug.linux.it/~da/ ------------------------------------------------------------------------------ Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel