Hi Arunima, cool project! It depends on what you consider a "script" and a "database". :-)
Options for script: - plot summary: the first entries in the page http://www.imdb.com/title/tt0133093/plotsummary - synopsis: the entries in the "synopsis" section of the same page: http://www.imdb.com/title/tt0133093/plotsummary?ref_=tt_ql_stry_2 - the full movie script; something that can be downloaded from other sites (we do not parse them), like: http://www.imsdb.com/scripts/Matrix,-The.html http://www.dailyscript.com/scripts/the_matrix.pdf Options for "database": - the web pages of the IMDb site: http://www.imdb.com/ - the old plain text data files released by IMDb until the end of 2017: ftp://ftp.funet.fi/pub/mirrors/ftp.imdb.com/pub/frozendata/ - the new s3 dataset released since then: http://www.imdb.com/interfaces/ Talking about numbers, on the web pages we have: - I'd say about 1.4mln movies with a plot summary: http://www.imdb.com/search/title?has=plot the downside is that they are not easy to parse, at least not with IMDbPY, since it's intented to be used to fetch a movie at a time, and not as a tool for mass-import/scraping - an unknown number of synopses (can't find a way to search only for them) - zero full scripts On the old plain text data files, that IMDbPY is still able to parse and put on a SQL database, for later consumption (but obviously are no longer updated): - about 590.000 movies with a plot summary - zero synopses - zero full script On the new dataset distributed by IMDb, that for the moment we're still unable to parse (but this will change in a few weeks): - zero plot summaries - zero synopses - zero full scripts Hope this helps. Let us know if you need help, On Sat, Jan 27, 2018 at 6:37 PM, Arunima Kayath <arun...@berkeley.edu> wrote: > > I am a student at UC Berkeley. I would like to do a project to predict movie > ratings based on the script (age appropriateness). I need a meaningful > number of scripts to do that. Can you tell me how many authentic movie > scripts are available in the database ? > > Thanks > > Arunima > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Imdbpy-help mailing list > Imdbpy-help@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/imdbpy-help > -- Davide Alberani <davide.alber...@gmail.com> [PGP KeyID: 0x3845A3D4AC9B61AD] http://www.mimante.net/ ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Imdbpy-help mailing list Imdbpy-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-help