[ Please keep me on CC, I'm not on the list. ]
dinkypumpkin wrote:
> On 23/09/2013 06:39, Jonathan Larmour wrote:
>> I'm attaching a patch which does fix the problem, although I'm sure someone
>> with better perl skills than me (which is most people, as I don't really use
>> perl) can make it a bit better.
>
> Thanks. I've trimmed your patch a bit:
>
> https://github.com/dinkypumpkin/get_iplayer/commit/3f377f0a1c16904c9801213bbed2c666f4cd6e7a.patch
>
> Let me know if it still works at your end. If so, I'll merge it.
Thanks for the trim - those vars were sticking around because of some debug
stuff I had removed before posting. Unfortunately there's a problem with that
new version - you accidentally reverted one important change to the regexp. As
a result that would cause it to get greedy when presented with the
existing/older format of TT subtitles which could include things such as this
(which should be all on one line if the mailing list retains that):
<p style="s1" begin="00:00:02.00" id="p0" end="00:00:04.52">Tango
nine-one,<br />he's hit two vehicles.<br /></p>
In fact it would cause things to get stuck in a loop.
I'm attaching a, hopefully final, patch. New! Now with comments!
Jifl
diff --git a/get_iplayer b/get_iplayer
index 59ad17d..8c19195 100755
--- a/get_iplayer
+++ b/get_iplayer
@@ -7212,10 +7212,21 @@ sub download_subtitles {
#<p begin="0:01:12.400" end="0:01:13.880">Thinking.</p>
#<p begin="00:01:01.88" id="p15" end="00:01:04.80"><span
tts:color="cyan">You're thinking of Hamburger Hill...<br /></span>Since we
left...</p>
#<p begin="00:00:18.48" id="p0" end="00:00:20.52">APPLAUSE AND
CHEERING</p>
+ # There is also a multiline form:
+ #<p region="speaker" begin="00:00:01.840" end="00:00:08.800"><span
style="textStyle"> This programme contains <br/>
+ # some strong language</span></p>
+
my $count = 1;
- my @lines = grep /<p\s.*begin=/, split /\n/, $subs;
- for ( @lines ) {
+ my @lines = split /\n/, $subs;
+ for my $ix ( 0 .. $#lines ) {
my ( $begin, $end, $sub );
+ $_ = $lines[$ix];
+ # Deal with multiline subtitles. We infer these from a trailing
<br/> at the
+ # end of the line; in which case we concat the following line.
+ while ( m{<br[^\>]*?>\s*$} ) {
+ s|<br[^\>]*?>\s*$| |g;
+ $_ .= $lines[++$ix];
+ }
# Remove <br /> elements
s|<br.*?>| |g;
# Remove >1 spaces
_______________________________________________
get_iplayer mailing list
[email protected]
http://lists.infradead.org/mailman/listinfo/get_iplayer