Since I was thinking about scraping iPlayer yesterday, I spent an hour or two this evening and hacked together this Python script which pulls programme info from the iPlayer TV category index pages and (for now) outputs the data as JSON:

https://github.com/StevenMaude/nitroradical

There's three ways something like this could be used:

1. Client-side scraping of programme data (maybe a Perl script that more directly hooks into get_iplayer would be better?)

It would take some time to populate the programme data. Scraping the index pages for TV actually doesn't take that long, but in some cases you'd have to pull out individual programme pages to get all the episode info for them. As is, my script just gets the most recent episode.

2. Server-side scraping of programme data so that users could scrape on a server and set up a feed users can access.

The advantage of this is that it would be much quicker for users as you could access the processed feed in a single HTTP request (rather than hitting the BBC site numerous times).

However, dinkypumpkin mentioned that centralising a feed wasn't a preferred option. That said, there's nothing to stop having a user-specified option to point get_player to a specific feed URL. If someone hosts a feed, then decides to takes it down, someone else could take over.

Both of those would need get_iplayer to be modified.

3. It would be possible to use the output of this scraper client-side to search for programmes of interest, and then call get_iplayer with the appropriate pid to download the programme if any are found. More work would be needed for this, and it would be hacky, but could work too. This wouldn't need get_iplayer to be modified; it would just uses the existing pid download feature.

If there's interest, I'm happy to work on wrangling out get_iplayer compatible feed data. (A guide to the structure of the iPlayer feeds would be handy.)

_______________________________________________
get_iplayer mailing list
[email protected]
http://lists.infradead.org/mailman/listinfo/get_iplayer

Reply via email to