Hi! On Wed, Sep 09, 2015 at 10:02:12AM +0200, Tim Janik wrote: > I'd like to make another Beast release, but some tests are currently failing. > In particular some of the audio feature tests are breaking and need threshold > adjustments to pass. > > I'd like to get your opinion on the threshold adjustments, so for your > convenience > I've appended: > a) the threshold diff required for a successful release; > b) a build log from the feature test dir in case you want a peek at the > feature values. > > Some tests vary by several percent (syndrum) while e.g. partymonster contantly > reaches a solid 100% similarity. > > FYI, the bse2wav.scm script has been replaced by "bsetool.cc render2wav" for > porting > reasons, but that's not related to audio processing.
Did you keep the deterministic random that --bse-disable-randomization ususally provided? This is just a guess, but removing it could cause such problems. > diff --git tests/audio/Makefile.am tests/audio/Makefile.am > index 7805187..3c6dbde 100644 > --- tests/audio/Makefile.am > +++ tests/audio/Makefile.am > @@ -58,7 +58,7 @@ minisong-test: > $(BSE2WAV) $(srcdir)/minisong.bse $(@F).wav > $(BSEFEXTRACT) $(@F).wav --cut-zeros --channel 0 --avg-spectrum > --spectrum --avg-energy > $(@F).tmp > $(BSEFEXTRACT) $(@F).wav --cut-zeros --channel 1 --avg-spectrum > --spectrum --avg-energy >> $(@F).tmp > - $(BSEFCOMPARE) $(srcdir)/minisong.ref $(@F).tmp --threshold 99.99 > + $(BSEFCOMPARE) $(srcdir)/minisong.ref $(@F).tmp --threshold 98.00 > rm -f $(@F).tmp $(@F).wav > > FEATURE_TESTS += syndrum-test > @@ -67,7 +67,7 @@ syndrum-test: > $(BSE2WAV) $(srcdir)/syndrum.bse $(@F).wav > $(BSEFEXTRACT) $(@F).wav --cut-zeros --channel 0 --avg-spectrum > --spectrum --avg-energy > $(@F).tmp > $(BSEFEXTRACT) $(@F).wav --cut-zeros --channel 1 --avg-spectrum > --spectrum --avg-energy >> $(@F).tmp > - $(BSEFCOMPARE) $(srcdir)/syndrum.ref $(@F).tmp --threshold 99.99 > + $(BSEFCOMPARE) $(srcdir)/syndrum.ref $(@F).tmp --threshold 91.00 > rm -f $(@F).tmp $(@F).wav > > FEATURE_TESTS += velocity-test > @@ -85,7 +85,7 @@ EXTRA_DIST += organsong.bse organsong.ref > organsong-test: > $(BSE2WAV) $(srcdir)/organsong.bse $(@F).wav > $(BSEFEXTRACT) $(@F).wav --cut-zeros --channel 0 --avg-spectrum > --spectrum --avg-energy > $(@F).tmp > - $(BSEFCOMPARE) $(srcdir)/organsong.ref $(@F).tmp --threshold 99.99 > + $(BSEFCOMPARE) $(srcdir)/organsong.ref $(@F).tmp --threshold 98.00 > rm -f $(@F).tmp $(@F).wav > > # ADSR Test checks the mono channel envelope rendering > @@ -120,7 +120,7 @@ EXTRA_DIST += xtalstringssong.bse xtalstringssong.ref > xtalstringssong-test: > $(BSE2WAV) $(srcdir)/xtalstringssong.bse $(@F).wav > $(BSEFEXTRACT) $(@F).wav --cut-zeros --channel 0 --avg-spectrum > --spectrum --avg-energy > $(@F).tmp > - $(BSEFCOMPARE) $(srcdir)/xtalstringssong.ref $(@F).tmp --threshold 99.99 > + $(BSEFCOMPARE) $(srcdir)/xtalstringssong.ref $(@F).tmp --threshold 99.90 > rm -f $(@F).tmp $(@F).wav The tests should ensure that we don't accidentally break how things sound like. So ideally our goal is that things sound exactly the same (100.00). This cannot always be done, so 99.99 is generally used. However the thresholds you used are nowhere near 99.99, so most likely things don't sound the same, and we should investigate why, to ensure that we didn't break things. To understand why I say things may be broken, and we should check this, it is important to know that although scores are between 0.00 and 100.00, only those scores very similar to 100.00 ensure that things really sound the same. So a score of 98.00 already tolerates a lot of difference to the original. I'm not sure if the difference in any of the files is audible, but it is significant. Cu... Stefan -- Stefan Westerfeld, http://space.twc.de/~stefan _______________________________________________ beast mailing list [email protected] https://mail.gnome.org/mailman/listinfo/beast
