On Thu, Nov 25, 2021 at 8:54 AM Andrey Rahmatullin <[email protected]> wrote: > > On Wed, Nov 24, 2021 at 06:38:07PM +0100, Giulio Paci wrote: > > Dear mentors, > > while updating SCTK package I enabled the execution of the test suite > > which was previously disabled. The tests are working fine on x86_64 > > architecture, but a couple of them are failing on i386. > > After investigation [1] I found out that tests are failing because they > > rely on the assumptions that, when a and b have the same double value: > > 1) "a < b" is false; > > 2) "a - b" is 0.0. > What do they actually test, why do they use these assumptions?
SCTK is a toolkit to evaluate speech recognition (and other related tasks) tools performance. These tools usually read audio streams and produce simple text files containing the transcriptions and time information (relative to the stream) to synchronize the transcription to the stream. These files are very similar to video subtitles files. The SCTK compares two textual files (usually one is a manually created file and the other is created by an automatic tool) to score how different these outputs are. The tests are checking that SCTK produces the same score reports when provided with the same input files. The double values refer to timing information. The specific format, known as CTM, stores information in seconds in decimals (e.g. "30.66" seconds) from the beginning of the stream. The failing tool reads this information into double variables and, to simplify, it compares "up to when the timings in one file is less than the timings in the other files. If it exceeds or is the same, it checks the difference". In this kind of application you are not usually going beyond what you can store uncompressed on a filesystem in PCM. So, even assuming audio samples of 1 byte, int64 should be a reasonable type to store timings (in samples, rather then seconds). But I understand that doing so would complicate the logic of the tool, especially since it is very unlikely that math approximation would be an issue. To be honest I did not expect the corner case above would fail since it is comparing a value against another value that should just be the same. I have uploaded simplified code that showcase the issue and some of the instabilities [1]. The code seems to behave as if the last value is different from the other 3, supposed equal values. [1] https://pastebin.com/embed_js/T3g560UV Bests, Giulio

