Hi Arunima,
cool project!

It depends on what you consider a "script" and a "database". :-)

Options for script:
- plot summary: the first entries in the page
http://www.imdb.com/title/tt0133093/plotsummary
- synopsis: the entries in the "synopsis" section of the same page:
http://www.imdb.com/title/tt0133093/plotsummary?ref_=tt_ql_stry_2
- the full movie script; something that can be downloaded from other
sites (we do not parse them), like:
  http://www.imsdb.com/scripts/Matrix,-The.html
  http://www.dailyscript.com/scripts/the_matrix.pdf

Options for "database":
- the web pages of the IMDb site: http://www.imdb.com/
- the old plain text data files released by IMDb until the end of
2017: ftp://ftp.funet.fi/pub/mirrors/ftp.imdb.com/pub/frozendata/
- the new s3 dataset released since then: http://www.imdb.com/interfaces/


Talking about numbers, on the web pages we have:
- I'd say about 1.4mln movies with a plot summary:
http://www.imdb.com/search/title?has=plot
  the downside is that they are not easy to parse, at least not with
IMDbPY, since it's intented
  to be used to fetch a movie at a time, and not as a tool for
mass-import/scraping
- an unknown number of synopses (can't find a way to search only for them)
- zero full scripts

On the old plain text data files, that IMDbPY is still able to parse
and put on a SQL database, for
later consumption (but obviously are no longer updated):
- about 590.000 movies with a plot summary
- zero synopses
- zero full script

On the new dataset distributed by IMDb, that for the moment we're still unable
to parse (but this will change in a few weeks):
- zero plot summaries
- zero synopses
- zero full scripts


Hope this helps.

Let us know if you need help,



On Sat, Jan 27, 2018 at 6:37 PM, Arunima Kayath <arun...@berkeley.edu> wrote:
>
> I am a student at UC Berkeley. I would like to do a project to predict movie
> ratings based on the script (age appropriateness). I need a meaningful
> number of scripts to do that. Can you tell me how many authentic movie
> scripts are available in the database ?
>
> Thanks
>
> Arunima
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Imdbpy-help mailing list
> Imdbpy-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/imdbpy-help
>



-- 
Davide Alberani <davide.alber...@gmail.com>  [PGP KeyID: 0x3845A3D4AC9B61AD]
http://www.mimante.net/

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Reply via email to