Re: Beast Feature Thresholds

Tim Janik Thu, 10 Sep 2015 09:33:34 -0700

On 10.09.2015 16:08, Stefan Westerfeld wrote:
>    Hi!
>
> On Wed, Sep 09, 2015 at 10:02:12AM +0200, Tim Janik wrote:
>> I'd like to make another Beast release, but some tests are currently failing.
>> In particular some of the audio feature tests are breaking and need threshold
>> adjustments to pass.
>>
>> I'd like to get your opinion on the threshold adjustments, so for your 
>> convenience
>> I've appended:
>> a) the threshold diff required for a successful release;
>> b) a build log from the feature test dir in case you want a peek at the 
>> feature values.
>>
>> Some tests vary by several percent (syndrum) while e.g. partymonster 
>> contantly
>> reaches a solid 100% similarity.
>>
>> FYI, the bse2wav.scm script has been replaced by "bsetool.cc render2wav" for 
>> porting
>> reasons, but that's not related to audio processing.
> Did you keep the deterministic random that --bse-disable-randomization 
> ususally
> provided? This is just a guess, but removing it could cause such problems.


Yes, --bse-disable-randomization was provied as can be seen from the log I had 
attached.

>> diff --git tests/audio/Makefile.am tests/audio/Makefile.am
>> index 7805187..3c6dbde 100644
>> --- tests/audio/Makefile.am
>> +++ tests/audio/Makefile.am
>> @@ -58,7 +58,7 @@ minisong-test:
>>      $(BSE2WAV) $(srcdir)/minisong.bse $(@F).wav
>>      $(BSEFEXTRACT) $(@F).wav --cut-zeros --channel 0 --avg-spectrum 
>> --spectrum --avg-energy  > $(@F).tmp
>>      $(BSEFEXTRACT) $(@F).wav --cut-zeros --channel 1 --avg-spectrum 
>> --spectrum --avg-energy >> $(@F).tmp
>> -    $(BSEFCOMPARE) $(srcdir)/minisong.ref $(@F).tmp --threshold 99.99
>> +    $(BSEFCOMPARE) $(srcdir)/minisong.ref $(@F).tmp --threshold 98.00
>>      rm -f $(@F).tmp $(@F).wav
[...]

The tests should ensure that we don't accidentally break how things sound like.
So ideally our goal is that things sound exactly the same (100.00). This cannot
always be done, so 99.99 is generally used.


Yes, however AFAIK, the synthesis bits and timing bits weren't touched
since things passed the last time and now I see random failures.

> However the thresholds you used are nowhere near 99.99, so most likely things
> don't sound the same, and we should investigate why, to ensure that we didn't
> break things.
>
> To understand why I say things may be broken, and we should check this, it is
> important to know that although scores are between 0.00 and 100.00, only those
> scores very similar to 100.00 ensure that things really sound the same. So a
> score of 98.00 already tolerates a lot of difference to the original. I'm not
> sure if the difference in any of the files is audible, but it is significant.

I'm aware of all this, but there's no clear way to trace such errors AFAICS.
I.e. I can't currently tell why *some* feature tests sometimes fail and
sometimes not.
As a reminder, partymonster always passes 100%, so there's no general
(timing) brokenness at play here...

>    Cu... Stefan


-- 
Yours sincerely,
Tim Janik

https://testbit.eu/timj/
Free software author and speaker.

_______________________________________________
beast mailing list
[email protected]
https://mail.gnome.org/mailman/listinfo/beast

Re: Beast Feature Thresholds

Reply via email to