On Fri, Nov 25, 2016 at 7:53 PM, Mohamed Oun <mohamed...@gmail.com> wrote: > > I was wondering if there was a way to prune the database to remove all titles > that have <200 votes for example, to reduce clutter.
Hi Mohamed, it's not so simple, I fear: IMDbPY was thought to fetch a single item at a time. Your best option is to select the data according to your needs. For example, let's assume you want to extract movie_id (as internally used by IMDbPY - beware that this is NOT the ID used on the site, a thing that we don't have, locally) - plus its title and the number of votes of every movie with more than 200 votes. First, check out the content of the info_type (yeah, bad name, sorry) table, to find the id of the "votes" entry. In my database, it's 100. The movie_info (and/or movie_info_idx) table contains all the information about a movie, with info_type_id set accordingly. So, we need to select stuff from tables 'title' and 'movie_info', based on the relation between title.id and movie_info.movie_id (to know which movie we're dealing with) and filtering based on movie_info.info_type_id and also filtering out anything contained in movie_info.value that is less than 200 (beware that this field is always a string). Now, for PostgreSQL it would be something like (to show the first 10 results): SELECT t.id, t.title, i.info FROM title t, movie_info_idx i WHERE t.id = i.movie_id AND i.info_type_id = 100 AND CAST(i.info AS int) >= 200 LIMIT 10; If you're using MySQL the syntax is something different, but the concept is the same (there's an implicit INNNER JOIN in the above statement). A last note: beware that the data structure of IMDbPY may lack some index that you may need to speed thing up. In that case, just create them before processing. > Also, is there a way to iterate over all the titles in the database? Well, you can always export a list of imdb_id and iterate over it. Honestly, doing so using IMDbPY could be veeery slow, since it assumes that you want to access most or all the information about a title immediately. If you only need some, you better use IMDbPY to import the data in a db (like you have already done) and find the data you need using SQL queries. Hope this helps, -- Davide Alberani <davide.alber...@gmail.com> [PGP KeyID: 0x465BFD47] http://www.mimante.net/ ------------------------------------------------------------------------------ _______________________________________________ Imdbpy-help mailing list Imdbpy-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-help