There's been a subtle change in some of the programme subtitles from the Beeb.
You can see this with, for example:
get_iplayer --get --pid b03bsfc8 --subtitles-only --subsraw
And compare the .ttxt file with the .srt file. There are <br /> elements at
the end of various lines _and_ newlines. get_iplayer does not expect this. It
expects each subtitle to be on its own line.
I'm attaching a patch which does fix the problem, although I'm sure someone
with better perl skills than me (which is most people, as I don't really use
perl) can make it a bit better.
Jifl
diff --git a/get_iplayer b/get_iplayer
index 59ad17d..627b63f 100755
--- a/get_iplayer
+++ b/get_iplayer
@@ -7213,9 +7213,18 @@ sub download_subtitles {
#<p begin="00:01:01.88" id="p15" end="00:01:04.80"><span
tts:color="cyan">You're thinking of Hamburger Hill...<br /></span>Since we
left...</p>
#<p begin="00:00:18.48" id="p0" end="00:00:20.52">APPLAUSE AND
CHEERING</p>
my $count = 1;
- my @lines = grep /<p\s.*begin=/, split /\n/, $subs;
- for ( @lines ) {
+ my @lines = split /\n/, $subs;
+ for my $ix ( 0 .. $#lines ) {
my ( $begin, $end, $sub );
+ my $myline = $lines[$ix];
+ while ($myline =~ m/<br[^\>]*?>\s*$/) {
+ $myline =~ s|<br[^\>]*?>\s*| |g;
+ $ix++;
+ my $nextline = $lines[$ix];
+ $myline = join('', $myline, $nextline);
+ }
+ $_ = $myline;
+ next if (!m/<p\s.*begin=/);
# Remove <br /> elements
s|<br.*?>| |g;
# Remove >1 spaces
_______________________________________________
get_iplayer mailing list
[email protected]
http://lists.infradead.org/mailman/listinfo/get_iplayer