Hello, On Tuesday 14 November 2006 11:34, Simon Marlow wrote: > Thorkil Naur wrote: > > ... I have > > produced an experimental darcs patch that solves some problems, while > > possibly introducing others: > > http://thorkilnaur.dk/~tn/GHC/testsuite/patch/barton_mangler_bug_patch_1.patch . > > Comments to this would be most welcome. > > For those who haven't looked at Thorkil's patch: he proposes adding some code to > the testsuite driver to allow sample output specific to a particular way, and > used this to give sample output for one test (barton-mangler-bug) specific to > the 'opt' way on PPC. > > I would like it to be the case that any differences at all in the output from > one way to another are bugs, including floating point differences.
It is a basic question of what varying circumstances we believe should be handled by the test framework. I tend to agree with you and, hence, to reject my own suggestion of extending the testsuite driver to allow different output for different ways.. > On x86_64, > we always generate the same results regardless of -fvia-C or -fasm, for example. > However, it might be that this isn't practical on all platforms. I feel rather sure that it isn't. > The question > is whether we should consider it a *bug* if a test doesn't give consistent > floating-point answers or not. Anyone have any thoughts on this? Let me put it in this way: It is well known that it is almost always bad to test whether two floating point numbers are exactly equal. So in this sense, a test whose outcome depends on testing whether two floating point numbers are exactly equal is a bad test. (Converting floating point numbers to decimal strings and comparing the strings which is what really happens seems to make matters even worse.) To be sure, if we are really testing the floating point operations, we are of course entitled to test equality. But if a test does not deal with floating point operations as such, but merely includes floating point numbers in its output incidentally, the test is probably bad. I cannot see that the Haskell report specifies precise properties of the floating point support, so even implementations that conform to the standard (Haskell 98) can be expected to differ. Hence, any test that involves output of floating point numbers might produce different output for reasons that are entirely unrelated to the test, not a particularly appetizing situation. Whether difference in floating point results between different ways should be considered a bug in GHC, I cannot say. I would tend towards "no", but that is probably because I don't have any particular intense interest in floating point numbers. Getting finally to something more specific, my impression until your question here had been that the barton-mangler-bug test involved floating point numbers incidentally: I imagined that someone (named Barton, perhaps?) ran this program at some point in time, discovered some unfortunate behaviour (such as the program crashing or producing wild results), that this behaviour was traced down to an error in some mangler (the gcc assembler language output "post processor"), and that the test was included and maintained in the testsuite to ensure that this bug was thouroughly stamped out. Based on your question, I realised that my impression could be entirely false and that the central property tested was precisely the floating point differences observed for some ways. I am still in doubt, so if anyone knows the story behind the barton-mangler-bug, I would be delighted to hear it. > > If it's a bug, then we just declare these tests to be expected failures. If > it's not a bug, then we have to allow per-way sample output, as per Thorkil's patch. > As I have already mentioned, I think my patch is a mistake. Depending on what anyone can tell me about the barton-mangler-bug, additional work would seem to go in one of two directions: If the floating point numbers are involved incidentally and the mangler bug still threatens, work should attempt to remove the floating point numbers from the output and produce a test case that exposes the bug more succinctly. I would certainly need some additional help to do this. On the other hand, if the floating point difference between, e.g., opt and normal is the real issue, it would still seem advantageous and quite possible to reduce the size of the test case, to make it easier to figure out the cause of the difference. > Cheers, > Simon > Best regards Thorkil _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users