On 02/11/14 08:52, Chris Allison wrote: > Peter, > > some good ideas there, but there is no need to scrape the web pages > when all the schedule info you could possibly need is available in > xml, json and yaml files at urls of this form: > > www.bbc.co.uk/radio4/programmes/schedules/fm/this_week.json > www.bbc.co.uk/radio4extra/programmes/schedules/2014/11/1.json > www.bbc.co.uk/bbcfour/programmes/schedules/last_week.json > > etc. Thanks for that Chris. Have been excited enough by that first link into experimenting with the json parsing utility called 'jq'.
A pipeline like the following will produce all the titles, pids and synopses: wget -O - http://www.bbc.co.uk/radio4/programmes/schedules/fm/this_week.json | jq '.[] | .[] | .[] | .[] | .programme as $P | $P.display_titles.title,$P.short_synopsis,$P.pid' So, just a 6-line tail with wget -q -O - http://www.bbc.co.uk/radio4/programmes/schedules/fm/this_week.json | jq '.[] | .[] | .[] | .[] | .programme as $P | $P.display_titles.title,$P.short_synopsis,$P.pid' | tail -n 6 will get you the following: ============ "The Film Programme" "Director Mike Leigh discusses art and movie-making in his latest film Mr Turner." "b04mgxtq" "Something Understood" "Mark Tully debates the cultural benefits of classical music with composer James MacMillan." "b04n2fmh" ============ Regards, Charles _______________________________________________ get_iplayer mailing list [email protected] http://lists.infradead.org/mailman/listinfo/get_iplayer

