On Jan 10, Mike Castle <dalg...@gmail.com> wrote: > Note: > http: u'The Motorcycle Diaries::(USA)' > sql: u'Motorcycle Diaries, The (2004)::(USA)' > > So, I wrote a routine to normalize those to get rid of ::(USA) and > (2004)::(USA), great. > > But now I have to move that damned article around too?!?
You can use the mighty functions provided by IMDbPY! ;-) They are in the 'utils' module. Once you have a title without the year and other funny things, you can use the normalizeTitle function to turn every known article around (I've done a more or less extensive research on the data we're working on to identify articles - a thing that is more art than science, as you can guess). Notice that there are cases where your function can return the wrong thing (I'm not sure about akas, but the year can be something like (1999/II), meaning that this is the second movie with the same title produced that year). So your best hope, besides Obi-Wan Kenobi, is something like: from imdb import utils title = u'Motorcycle Diaries, The (2004)' tdict = utils.analyze_title(title, canonical=1) straight_title = utils.normalizeTitle(tdict['title']) > (aka Tagebuch der Lust, Teil 2 (1999) (TV)) (Germany) > > It looks like different languages have different ways of moving the > article to the end of the string. Interesting, I never knew that. "Teil" is not an article in Dutch, YOU INSENSITIVE ENGLISH-CENTRIC AMERICAN!!! :-D It means "part", for instance. :-) HTH, -- Davide Alberani <davide.alber...@gmail.com> [PGP KeyID: 0x465BFD47] http://erlug.linux.it/~da/ ------------------------------------------------------------------------------ Check out the new SourceForge.net Marketplace. It is the best place to buy or sell services for just about anything Open Source. http://p.sf.net/sfu/Xq1LFB _______________________________________________ Imdbpy-help mailing list Imdbpy-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-help