On Sun, Nov 28, 2010 at 2:50 AM, Derek Ditch <derek.ditch...@gmail.com> wrote: > > I'm working on a project that analyzes graph structures using a modified > version of PageRank for a sample data set, I'm considering IMDB, using > imdbpy,
Hi! Your project sounds extremely cool. :-) > So, it doesn't look like imdbpy has the ability built-in to iterate through > all movies, or movies of a specific genre, nor of actors. So I suppose I > will create a new method of the Movie class and implementation within the > sql parser to return all results. You're right: by itself IMDbPY doesn't have the ability to iterate over a large subset of the IMDb database, and I don't think it's a feature we should integrate too much; keep in mind that as a principle IMDbPY tries to be transparent regarding its access to the information: such a feature would be too specific of the SQL database, and impossible - or at least legally dubious - to implement for the HTTP access. Obviously a more or less separated package/framework to work on bunches of items extracted from the SQL database would be more than welcome and, to tell the truth, a way to express more complex searches on the SQL database could be a really nice and useful feature. :-) Basically, it works this way: each items which must be uniquely identified (Person, Movie, Character and Company instances) uses the 'id' primary key column of its database table as ID (as you may have noticed, the ID used for a movie in the SQL database is _not_ the imdbID used by IMDb on its web site, since the latter are not included in the plain text data file). So the best approach would be to access the SQL database in the same way IMDbPY does: since we're amazingly cool ;-) we didn't settle on one ORM, but we transparently support both SQLObject (we use its semantic, in our code) and SQLAlchemy. Their interface is abstracted in the dbschema.py, alchemyadapter.py and objectadapter.py (beware: there's a certain amount of black magic involved :-) To use them, the process is somewhat manual, and could probably be more automated; to import what you need, see the __init__ method of the IMDbSqlAccessSystem class in the sql/__init__.py file (very similar code can be fund in the imdbpy2sql.py script, around line 277. After that, as said, you can access the database using the created objects (the ones returned by the getDBTables function) using the SQLObject syntax. With these object, you can create complex queries on the database; once you have the list of IDs you're interested in (or a generator of IDs), you can use a normal imdb.IMDb('sql', ...) instance to access every information you need using IMDbPY. It goes without saying that it's possible that the information that you need are somewhat limited, and such a solution could be too much: maybe you can give up using the ORM abstraction and even IMDbPY, working directly on the database. Let me know how you decide to proceed, and if you need any help - as ideas or clarification on the internals of IMDbPY: unfortunately right now I don't have any time to write code :-/ -- Davide Alberani <davide.alber...@gmail.com> [PGP KeyID: 0x465BFD47] http://www.mimante.net/ ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel