On 01/11/2014 00:27, dinkypumpkin wrote:

I tried this same approach, but it foundered on radio programmes.  There
is just too much stuff there.  It's soul-crushingly slow to scrape the
iPlayer Radio site, at least for a desktop cache.  It would be great to
have everything available on iPlayer searchable off-site, but there is
too much of it for get_iplayer's current local caching model.  I'm going
to have another go at some point.

There is no real need to download *all* of the schedule information;
after all, only a fraction of it will ever be of any use to an
individual user.

I would use the BBC server to do the search for me, after which there is
little work to be done. For instance, if I look for all Book at Bedtime
episodes with this URL

    http://www.bbc.co.uk/radio/programmes/a-z/by/book%20at%20bedtime/player

then I am taken a page with a link to the series at

    http://www.bbc.co.uk/programmes/b006qtlx/episodes/player?page=1

through to `page=6`. That amounts to 52 programmes which, even on my
meagre 13 megabit connection that takes less than ten seconds, and the
results could be cached for practically instantaneous response for a
similar request in the future. There is also the possibility of writing
a batch solution that makes a query only every minute or so and could be
run continuously or overnight.

I'm more than happy to write a proof of concept if you're interested. I
have it half-written already just to get that timing information.

The one thing that bothers me is the terms and conditions of the web
site. I scanned through them quickly and couldn't find anything about
robotic access, but it would be a first if there isn't anything there.
If it's just a matter of obeying the /robots.txt then I'm more than
happy to go ahead.

Let me know how I can help.

Rob



---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


_______________________________________________
get_iplayer mailing list
[email protected]
http://lists.infradead.org/mailman/listinfo/get_iplayer

Reply via email to