Re: Changed subtitle format

Jonathan Larmour Mon, 23 Sep 2013 14:47:01 -0700

[ Please keep me on CC, I'm not on the list. ]

dinkypumpkin wrote:
> On 23/09/2013 06:39, Jonathan Larmour wrote:
>> I'm attaching a patch which does fix the problem, although I'm sure someone
>> with better perl skills than me (which is most people, as I don't really use
>> perl) can make it a bit better.
> 
> Thanks.  I've trimmed your patch a bit:
> 
> https://github.com/dinkypumpkin/get_iplayer/commit/3f377f0a1c16904c9801213bbed2c666f4cd6e7a.patch
> 
> Let me know if it still works at your end.  If so, I'll merge it.


Thanks for the trim - those vars were sticking around because of some debug
stuff I had removed before posting. Unfortunately there's a problem with that
new version - you accidentally reverted one important change to the regexp. As
a result that would cause it to get greedy when presented with the
existing/older format of TT subtitles which could include things such as this
(which should be all on one line if the mailing list retains that):
 <p style="s1" begin="00:00:02.00" id="p0" end="00:00:04.52">Tango
nine-one,<br />he's hit two vehicles.<br /></p>

In fact it would cause things to get stuck in a loop.

I'm attaching a, hopefully final, patch. New! Now with comments!

Jifl

diff --git a/get_iplayer b/get_iplayer
index 59ad17d..8c19195 100755
--- a/get_iplayer
+++ b/get_iplayer
@@ -7212,10 +7212,21 @@ sub download_subtitles {
        #<p begin="0:01:12.400" end="0:01:13.880">Thinking.</p>
        #<p begin="00:01:01.88" id="p15" end="00:01:04.80"><span 
tts:color="cyan">You're thinking of Hamburger Hill...<br /></span>Since we 
left...</p>
        #<p begin="00:00:18.48" id="p0" end="00:00:20.52">APPLAUSE AND 
CHEERING</p>
+       # There is also a multiline form:
+       #<p region="speaker" begin="00:00:01.840" end="00:00:08.800"><span 
style="textStyle">  This programme contains  <br/>
+       #                  some strong language</span></p>
+
        my $count = 1;
-       my @lines = grep /<p\s.*begin=/, split /\n/, $subs;
-       for ( @lines ) {
+       my @lines = split /\n/, $subs;
+       for my $ix ( 0 .. $#lines ) {
                my ( $begin, $end, $sub );
+               $_ = $lines[$ix];
+               # Deal with multiline subtitles. We infer these from a trailing 
<br/> at the
+               # end of the line; in which case we concat the following line.
+               while ( m{<br[^\>]*?>\s*$} ) {
+                   s|<br[^\>]*?>\s*$| |g;
+                   $_ .= $lines[++$ix];
+               }
                # Remove <br /> elements
                s|<br.*?>| |g;
                # Remove >1 spaces

_______________________________________________
get_iplayer mailing list
[email protected]
http://lists.infradead.org/mailman/listinfo/get_iplayer

Re: Changed subtitle format

Reply via email to