Daniel Sahlberg wrote on Sat, 23 Nov 2024 16:14 +00:00: > Den fre 22 nov. 2024 kl 14:33 skrev Daniel Shahaf <d...@daniel.shahaf.name>: > > If I force an svnsync test to fail: > > > > [[[ ⋮ > > TypeError: 'in <string>' requires string as left operand, not int > > FAIL: svnsync_tests.py 28: copy-revprops with removals > > PASS: svnsync_tests.py 29: fd leak during sync from serf to local > > ]]] > > > > So, two things are wrong here: > > > > 1. The test harness doesn't actually print the diff. > > > > That's probably because ndiff() is passed two lists of bytes objects, > > whereas it expects lists of str objects. > > I've tried to track down the root cause.
Thanks. > I think the issue was "introduced" > in r1742950 <https://svn.apache.org/r1742950>: "Dumps are binary data, > therefore explicitly use byte strings for all operations on them". But > since most tests succeed - good code quality I assume - compare_dump_files > never reach the ndiff() call in the above traceback and therefor it wasn't > noticed that ndiff() want strings. > > I believe the easiest solution is to add list comprehensions decoding each > line: > - print(''.join(ndiff(expected, actual))) > + print(''.join(ndiff([line.decode('utf-8') for line in expected], > [line.decode('utf-8') for line in actual]))) > > I've forced the error in another way, by > editing > subversion/tests/cmdline/svnsync_tests_data/delete-revprops.expected.dump > changing cmpilato to cMpilato and running > > $ make check TESTS=subversion/tests/cmdline/svnsync_tests.py#28 > > Before the change I got the same TypeError as above. After the change I get > a failed test with the expected exception. > > After reviewing history, I believe the above change is the "obvious fix". > One could argue if decode() should use some other encoding, I don't really > know this. I've committed as r1922041. I'm afraid this fix is incomplete. The key point, as you quote from the log message of r1742950, is that dumpfiles are binary data (because of how they represent files and properties). That means they are not encoded in /any/ encoding. Formally, a dumpfile is a finite sequence of octets, which is not guaranteed to represent a sequence of characters, a sequence of Unicode codepoints, or anything of the sort. It's the kind of file that one can reasonably gzip, base64, or download, but not spell check, wrap to 80 columns, or convert from one encoding or character set to another. In particular, the call «open('foo.dump', 'rb').read().decode(bar)» is incorrect for all values of BAR. Therefore, notwithstanding that calling .decode() improves the behaviour on the case at hand, it's not The Fix. Incidentally, decoding the binary dumpfile contents as UTF-8 in particular is problematic for two reasons: - Some dumpfiles are in fact not valid UTF-8. On such dumpfiles, compare_dump_files() will raise UnicodeDecodeError (from the .decode() call, before ndiff() is ever entered). . Usually, dumping a repository where any file contains invalid UTF-8 will result in a non-UTF-8 dump. - It's not obvious whether .decode("utf-8") is a one-to-one transformation, which is to say, whether two different bytes values [sic] could possibly decode to the same str value, in which case compare_dump_files() might false negative. So, what /should/ the test be doing? - Compare the two lists of bytes objects as, well, two lists of bytes objects? That would be nice, if Python's standard library enables it. - .decode("iso-8859-1")? This would work, sort of, insofar as it would be one-to-one and wouldn't raise for any input value. However, when difflib prints a line, the octets printed would not equal the octets in the file: for instance, the byte 0xD7 in the file would likely be printed as the bytes 0xC3 0x97 [== b"\xD7".decode("iso-8859-1").encode("utf-8")]. While this transformation /is/ reversible, and won't slow down debugging too much for anyone who knew to expect it and how to reverse it, for the test harness to do this would be rather unintuitive and surprising. - I think the easiest solution is to change x.decode("utf-8") to repr(x). repr() is guaranteed to return a string (https://docs.python.org/3/library/functions.html#repr), as required by difflib, and shouldn't introduce either false positives or false negatives. - Also, the line should be changed to not exceed 80 columns. > > > > > 2. The next test was run. > > > > It shouldn't have run, because the exception wasn't just the actual > > output differing from the expected output, but an error in > > determining whether they differ or not. > > > > I'm assuming you are looking for something like this: > [[[ > Index: subversion/tests/cmdline/svntest/main.py > =================================================================== > --- subversion/tests/cmdline/svntest/main.py (revision 1921978) > +++ subversion/tests/cmdline/svntest/main.py (working copy) > @@ -2016,9 +2016,11 @@ > logger.error('EXCEPTION: SystemExit(%d), skipping cleanup' % ex.code) > self._print_name(ex.code and 'FAIL: ' or 'PASS: ') > raise > - except: > + except BaseException as ex: > result = svntest.testcase.RESULT_FAIL > - logger.warn('CWD: %s' % os.getcwd(), exc_info=True) > + logger.warn('CWD: %s' % os.getcwd(), exc_info=False) > + logger.warn('Unexpected exception type %s' % type(ex)) > + raise > > os.chdir(saved_dir) > exit_code, result_text, result_benignity = self.pred.results(result) > ]]] > (Only the "raise" is actually needed here, I added some additional output > to help making it clear that the exception type is unexpected). First, what the incumbent «except» block does — catch "all other exception types" and not re-raise them — is not best practice. So, that block should definitely be changed /somehow/. More below. > After this change [...] > [[[ ⋮ > svnsync_tests.py...............................................................................make: > *** [Makefile:553: check] Error 1 > ]]] > > Without this change [...] > [[[ ⋮ > svnsync_tests.py...........................................................................................FAILURE > At least one test FAILED, checking /home/daniel/svn_trunk/tests.log > FAIL: svnsync_tests.py 28: copy-revprops with removals ⋮ > make: *** [Makefile:553: check] Error 1 > ]]] > > I'm not sure I agree the changed behaviour is better. I like the summary > and it is clearly indicated that one test failed. I like the summary too, and I would like for it to stay whenever possible… but let's untangle this one step at a time. Running svnsync_tests 28 in pre-r1922041 trunk with s/cmpilato/cMpilato/ (and only that one-bit change; I didn't change anything else in the dumpfile) reproduces the behaviour you describe here, both cases. The thing is, in these circumstances delete_revprops() raises TypeError. This exception type would be raised by the interpreter for certain bugs in the test code (such as trying to index an int, as in «python3 -c '42[42]'»). Therefore, upon seeing TypeError, for all the harness knows there's a bug /in the Python code doing the testing/, so the harness (with the patch applied) bails out without printing PASS/FAIL for test #28 and without running any subsequent tests, just as it would if test #28 had done «assert False». The test harness (with the patch applied) is doing exactly what it should be doing. In contrast, try running the one-bit-changed version of test #28 under post-r1922041 trunk. r1922041 causes delete_revprops() to raise svntest.Failure where it previously raised TypeError¹. The svntest.Failure is caught by one of the «except» blocks above the diff context, and the human-readable summary gets printed. This happens both with and without the change to the last «except» block; there is no behaviour change. Moreover, the observed behaviour seems correct to me. Overall, the reason one case didn't print the human-readable summary was because in that case the test raised a "possibly an internal error in the test" exception rather than a "test failed" exception. That's a bug in delete_revprops()'s callee svntest.verify.compare_dump_files(), not in the «except» block — and your r1922041 fixed that bug², whereupon the human-readable summary gets printed both with and without the patch. So, I think your changes to the «except» block are essentially correct. Perhaps the caught class should be Exception (rather than BaseException), perhaps the block should print more or fewer details, but overall, the new block behaves as expected, doesn't silently catch and discard unknown exceptions, and is hardly likely to break anything in tests that PASS… … but might break tests that XFAIL and rely on the incumbent "catch everything and discard it" behaviour. Actually, s/might/does/: with the «except» patch, the «make check» run is aborted upon reaching revert_tests.py:revert_remove_added(). That test is decorated @XFail() and uses «assert» to verify its expectations. Presumably, when that test is run, one of the «assert»s fails, so the test function throws AssertionError. Without the «except» patch, the AssertionError would be discarded; with the «except» patch, the AssertionError would be treated as a genuine assertion failure, i.e., as an indication that the internal state of the testing code has reached an "unreachable" state. Nevertheless, I still think the «except» patch is essentially correct; it simply unearthed an incorrect use of «assert» in revert_tests.py. Bottom line: +1 (concept). > ¹ … because delete_revprops()'s dumpfile happens to be valid UTF-8, so the .decode() succeeds rather than raise UnicodeDecodeError. ² … for dumpfiles that happen to be valid UTF-8. > Checking tests.log should > make it clear what failed. If I'd change something, I'd probably just add > the "Unexpected exception" message from the diff above. This would make it > clear in tests.log that we got an unexpected exception: You mean, to just add the message, without re-raising the exception or limiting the caught classes? I see several problems with that: - As mentioned in the initial code review, catching everything and silently discarding it is not best practice. And as mentioned in the subsequent analysis, the incumbent code literally catches and discards AssertionError exceptions. - If we catch an arbitrary exception and /don't/ re-raise it, we don't actually have any assurance that the overall process will exit with a non-zero return code, do we? And nobody ever reads tests.log unless «make check» exits non-zero (whether due to a FAIL, an XPASS, or an uncaught exception in the test harness). > [[[ > DIFF of raw dumpfiles (including expected differences) > --- expected > +++ actual ⋮ > File "/usr/lib/python3.11/difflib.py", line 1077, in IS_CHARACTER_JUNK > return ch in ws > ^^^^^^^^ > TypeError: 'in <string>' requires string as left operand, not int > W: Unexpected exception type <class 'TypeError'> > FAIL: svnsync_tests.py 28: copy-revprops with removals > ]]] > > Thoughts? Summary: - s/x.decode("utf-8")/repr(x)/ - not exceed 80 columns - +1 (concept) on the «except» patch - revert_tests should probably be fixed before that patch is committed And thanks for having taken the time to look into things in the first place, Daniel. Daniel