On Fri, Nov 25, 2016 at 7:53 PM, Mohamed Oun <mohamed...@gmail.com> wrote:
>
> I was wondering if there was a way to prune the database to remove all titles 
> that have <200 votes for example, to reduce clutter.

Hi Mohamed,
it's not so simple, I fear: IMDbPY was thought to fetch a single
item at a time.

Your best option is to select the data according to your needs.
For example, let's assume you want to extract movie_id (as internally
used by IMDbPY - beware that this is NOT the ID used on the site, a
thing that we don't have, locally) - plus its title and the number of votes of
every movie with more than 200 votes.

First, check out the content of the info_type (yeah, bad name, sorry) table,
to find the id of the "votes" entry.  In my database, it's 100.

The movie_info (and/or movie_info_idx) table contains all the information
about a movie, with info_type_id set accordingly.

So, we need to select stuff from tables 'title' and 'movie_info', based
on the relation between title.id and movie_info.movie_id (to know which
movie we're dealing with) and filtering based on movie_info.info_type_id
and also filtering out anything contained in movie_info.value that is less
than 200 (beware that this field is always a string).

Now, for PostgreSQL it would be something like (to show the first 10 results):
SELECT t.id, t.title, i.info FROM title t, movie_info_idx i WHERE t.id
= i.movie_id AND i.info_type_id = 100 AND CAST(i.info AS int) >= 200
LIMIT 10;

If you're using MySQL the syntax is something different, but the
concept is the same
(there's an implicit INNNER JOIN in the above statement).

A last note: beware that the data structure of IMDbPY may lack some index that
you may need to speed thing up.  In that case, just create them before
processing.

> Also, is there a way to iterate over all the titles in the database?

Well, you can always export a list of imdb_id and iterate over it.
Honestly, doing so using IMDbPY could be veeery slow, since it assumes
that you want to access most  or all the information about a title immediately.

If you only need some, you better use IMDbPY to import the data in a db (like
you have already done) and find the data you need using SQL queries.


Hope this helps,

-- 
Davide Alberani <davide.alber...@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
_______________________________________________
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Reply via email to