On 2/12/07, [EMAIL PROTECTED]
<[EMAIL PROTECTED]> wrote:
> I recently worked on xmltv2vdr.pl (version 1.0.6) and checked
> why it was so slow on my mighty Celeron 233. So I modified it
> a little to avoid reading all the xmltv file for each channel
> defined in the channels.conf. The result is good : I can process
> my 5Mo xmltv file in less than 10 minutes whereas it took at least
> 1 hour with vanilla 1.0.6 release.

Something more what you can do (just by looking source you provided)..

--

Caching of &xmltime2vdr, like

    return $timecache{$xmltime}->{$skew} if defined 
$timecache{$xmltime}->{$skew}
    $secs = &Date::Manip::UnixDate($xmltime, "%s") + $skew*60;
    $timecache{$xmltime}->{$skew} = $secs
    return secs;

But it depends on how much this function is called.. But hash lookup is probably
faster than running UnixDate from library. So it is a memory tradeoff.

I still haven't tested it but I doubt it'll help ..... I'll check later.

I see that there is still some basic Perl based optimizations for this.

For example there is browsing through @xmllines array, and every iteration
you recompile *ALL* regexp's. That is as many times as @xmllines has lines.
And if one recompile takes 1ms -> you waste time @xmllines * 1ms just for
compiling and not doing anything usefull.

Perl switch "o" is recompile once flag, use that everywhere where it is
possible. Variable is not a problem unless variable changes in every iteration.

[....]

I didn't know that (I'm not really a perl guru ... far from it). I'll
update my version. But it didn't help at all with my benchmark.


--

As there is many times $xmlline is matched with regexps etc. You should 
experiment
with "study $xmlline;" after chomp $xmlline. Study makes internal search tables
for string matches. So see which way the code is faster, with study or without
study. Use Unix shell's time-command for this. For extra boost with study you
probably would need to take away subroutine "xmltvtranslate" as for it $xmlline
is copied to subroutine's parameter space, and what is matched. And study would
not affect it. So instead of calling "$xmlline=xmltvtranslate($xmlline);" 
cut&paste
subroutines code here, and use $xmlline instead of $line.

    foreach $xmlline (@xmllines)
    {
        chomp $xmlline;
        study $xmlline;
        $xmlline=~s/und uuml;/ΓΌ/go;
        $xmlline...

This isn't pretty but could probably help a bit. You save time for @xmllines 
times calling
subroutine, and study would help you a lot as you use the same string all the 
time.


I'll check that later.


For constant string you could use ' ' instead of " ". " causes string to be
evaluated for variables

if ( $chanCur eq "" ) --> if ( $chanCur eq '' )

But this would be very minor effect..


I'll surely be too lazy to test that. sorry.


Split is heavy operation because of creating arrays, but you can limit it.

( $null, $xmlst, $null, $xmlet, @null ) = split(/\"/, $xmlline);

=> ( $null, $xmlst, $null, $xmlet, $null ) = split(/\"/, $xmlline, 5);

or even using regexp for this. I don't know input line for this, but if it is
foo,"something","something",...

($xmlst,$xmlet) = $xmlline =~ m:\"(.*?)\",\"(.*?)\":o;

or probably combine 2 regexp to a single

($xmlst,$xmlet,$channel) = $xmlline =~ 
m:\"(.*?)\",\"(.*?)\".*?channel=\"(.*?)\":o;

--

Again something very weird:

        if ( ($xmlline =~ /\<title/ ) )
        {
            #print $xmlline . "\n";
            ( $null, $tmp ) = split(/\>/, $xmlline);
            ( $vdrtitle, @null ) = split(/\</, $tmp);

            # Send VDR Title

            SVDRPsend("T $vdrtitle");
        }

Why not?

        SVDRPsend("T $1") if $xmlline =~ m:\<title\>(.*?)\</title\>:o;

Same for XML subtitle
        SVDRPsend("T $1") if $xmlline =~ m:\<sub-title\>(.*?)\</sub-title\>:o;

Yes I'll also prefer shorter code. I'll check further if something
like <title lang="en"> is also allowed to adapt the regex. For
information that change has no impact on my bench.

Generally
       if ( ($xmlline =~ /\<desc/ ) && ( $desccount == $dc ))
        {
            ( $null, $tmp ) = split(/\>/, $xmlline);
            ( $vdrdesc, @null ) = split(/\</, $tmp);

this is not a clever way to parse XML data in Perl. Just us regexp's which
match strings with Boyer-Moore algorithm (same as Unix grep) and compile once.


Agree. I'll try to modify it.


Some logical errors

if ( ($xmlline =~ /\<programme/ ) && ( $xmlline !~ /clumpidx=\"1\/2\"/ ) && ( 
$chanevent == 0 ) )

=>         if ( ( $chanevent == 0 ) && ($xmlline =~ /\<programme/ ) && ( $xmlline !~ 
/clumpidx=\"1\/2\"/ ) )

so program execution can skip if $chanevent != 0 much faster.
So Regexp would not be ran. This is normal short circuit operation.


In fact the check $chanevent == 0 is only usefull if the xml is not
well formed so it doesn't change anything.

Then
            elsif ( $chanCur ne $chan )
            {
                SVDRPsend("c");
                SVDRPsend(".");
                SVDRPreceive(250);

I think programmer wanted outout of "." -command, and see if it's status is 250?
But now I think code is checking status of "c" -command? As socket is not read
between calls, and there should be data in buffer for c-command. But I cannot be
sure as I don't know SVDRP command that well.

> c
< 250 foo
< 250-foo
> .
< 354-not ok

It could still succeed if from socket buffer "250-" is read. Also the 2 substr 
calls
in SVDRPreceive is a bit weird, but I am uncertain if regexp would help that. 
At least
change "-" to '-'.

I have no answer about this as I didn't modify that code ..... it
seems to work so for now I don't touch it.

There is a LOT to improve, but if you do these, you Celeron will fly. I hope 
you'll get
to minute scale (or even better). Look with "time xmltv2vdr" to see how much 
process
time is used for user code and how much for kernel code. And to see if 
optimizations help.

Have fun.. :)

Thanks a lot for your hints.

Sebastien

_______________________________________________
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr

Reply via email to