There's been a subtle change in some of the programme subtitles from the Beeb.
You can see this with, for example:

get_iplayer --get --pid b03bsfc8 --subtitles-only --subsraw

And compare the .ttxt file with the .srt file. There are <br /> elements at
the end of various lines _and_ newlines. get_iplayer does not expect this. It
expects each subtitle to be on its own line.

I'm attaching a patch which does fix the problem, although I'm sure someone
with better perl skills than me (which is most people, as I don't really use
perl) can make it a bit better.

Jifl
diff --git a/get_iplayer b/get_iplayer
index 59ad17d..627b63f 100755
--- a/get_iplayer
+++ b/get_iplayer
@@ -7213,9 +7213,18 @@ sub download_subtitles {
        #<p begin="00:01:01.88" id="p15" end="00:01:04.80"><span 
tts:color="cyan">You're thinking of Hamburger Hill...<br /></span>Since we 
left...</p>
        #<p begin="00:00:18.48" id="p0" end="00:00:20.52">APPLAUSE AND 
CHEERING</p>
        my $count = 1;
-       my @lines = grep /<p\s.*begin=/, split /\n/, $subs;
-       for ( @lines ) {
+       my @lines = split /\n/, $subs;
+       for my $ix ( 0 .. $#lines ) {
                my ( $begin, $end, $sub );
+               my $myline = $lines[$ix];
+               while ($myline =~ m/<br[^\>]*?>\s*$/) {
+                   $myline =~ s|<br[^\>]*?>\s*| |g;
+                   $ix++;
+                   my $nextline = $lines[$ix];
+                   $myline = join('', $myline, $nextline);
+               }
+               $_ = $myline;
+               next if (!m/<p\s.*begin=/);
                # Remove <br /> elements
                s|<br.*?>| |g;
                # Remove >1 spaces
_______________________________________________
get_iplayer mailing list
[email protected]
http://lists.infradead.org/mailman/listinfo/get_iplayer

Reply via email to