get_iplayer has been more or less repaired, but there are still some wounds. I'm going to release what I have on Sunday. I'm on the road next week, so I've run out of time to do more for the time being. Consider it a stopgap until progress can be made on other fronts. This is where things are:

1. I've disabled code related to the discontinued feeds, so you shouldn't get any more bogus values in your metadata tags. You should also see thumbnails again in files < 7 days old downloaded via PID.

2. The new release will support entry of multiple PIDs.

3. I've more or less restored the 7 day cache for TV and radio. There are still some holes in it:

a. It is not possible to search for audiodescribed versions of programmes. I haven't been able to source that information. If anyone has any clues on the subject, chime in - but not if your suggestion is to scrape the iPlayer site. That isn't on the table right just yet.

You can still download audiodescribed versions, but you'll have to look for them on the iPlayer site. Signed versions should still be flagged in the get_iplayer cache, but some may be missing. Again, check the iPlayer site if in doubt.

I've changed get_iplayer to always scrape the related episode page to look for audiodescribed/signed versions when requested, so hopefully more downloads will be successful. I found a number of cases where the playlist data for recent programmes didn't contain identifiers for audiodescribed versions even though they existed on the iPlayer site.

b. It is not possible to search radio programmes by category. TV programmes still have category information. There is a source for radio category information, but it uniformly foundered on Radio 4 and Radio 4 Extra, which is where the categories are most meaningful. I know that is going to break some PVR searches, but the alternative is a support headache I can't absorb.

c. I can't vouch that every programme from the previous 7 days will show up in the cache. As always, you can use the PID for any programme not in the cache. By the same token, I can't vouch that every programme in the cache will be downloadable. The new feeds contain noticeably more programmes, some due to the inclusion of web-only stuff. With the heavier load, cache refreshes are noticeably slower than with the old feeds, ca. 90 seconds for me for tv+radio.

2. The more-or-less restored cache depends on some old data feeds lingering at the BBC. Recent events have taught us that they could disappear without warning, so I've implemented a fallback mechanism. There will be a new option that will switch the cache to refresh from the channel schedule pages instead of the old data feeds. However, this fallback is also limited:

a. It is not possible to search for audiodescribed or signed versions of programmes. That information isn't in the schedule pages.

b. It is not possible to search TV or radio programmes by category. Again, that information isn't in the schedule pages.

c. Cache refresh is slow, ca. 4+ minutes for a full TV and radio refresh for me. The time could be cut by about 1/3 by removing regional TV channel variations, but it cuts out 50+ programmes, so I've left them in for the present.

d. It appears that fewer programmes from the previous 7 days get cached compared to the feeds. Part of that is because the schedule pages don't show most web-only programmes. Part of it may also be because I'm checking availability info in the schedule pages more strictly than whatever produces the data feeds. Again, you can use the PID for anything not in the cache.

e. The only plus to using the schedule pages to populate the cache is that it becomes possible to expand your cache out to 30 days. It seems to work OK, if you have 10-15 minutes to refresh your cache. There will be an option for this.

f. I've given you enough rope to hang yourself, but don't put this fallback option into regular use unless it becomes necessary - seriously. It's only there to avoid weeks like this one. I won't be interested in hearing how slow it is or how it doesn't locate some particular programme. And for pete's sake *don't* use it with the Web PVR. If you insist on playing around with it, you'll probably want to bump up --expiry to some gigantic number and refresh your cache manually as needed.

3. Looking further ahead

Some things that have been floated here in the past few days:

a. Programme data services: If somebody implements something along these lines, I'm sure get_iplayer could be integrated with it. It's clear that get_iplayer would never be able to access Nitro if and when it's ever opened up. But, if somebody can repackage Nitro data for wider use, that would be pretty useful.

b. iPlayer site scraping: This could also be the foundation of a programme data service instead of Nitro. It is also the only real hope for get_iplayer to regain a full-featured desktop cache, though I'm not sure it will be practical. A full scrape is out of the question for local caching - there are just too many programmes on the radio side. However, even caching just the previous 7 days will be much much slower than with the old data feeds. The number of requests and the amount of data to move over the wire and parse would be vastly greater. Some sort of parallelisation might help. The trick will be to figure out the right way to filter the listings down to a practical volume.

I started down this road, but it was way too slow for radio and it was going to be too much work for the time available. Plus, it didn't seem worth leaving get_iplayer crippled any longer than necessary. To do this properly will likely mean adding some dependencies to get_iplayer as well as some major reworking. I'm going to keep working in that direction just to see if it can be done, but no idea if it will be of practical use.

Also see Steven Maude's recent post for his take on the problem.

c. External search/indexing applications: To my mind, it seems like a good idea for some energetic person to split this out. get_iplayer badly needs to lose weight, not gain it, and there is a pretty clear functional separation between searching and downloading. get_iplayer needs a lot of work in handling metadata that could make it a better downloader, so it would be no bad thing to get out of the caching business. I'll have my pony now, thanks.


_______________________________________________
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer

Reply via email to