On Fri, Oct 21, 2011 at 12:41 PM, Mike Matrigali <[email protected]> wrote: > Rick Hillegas wrote: >> >> -0 >> >> I am tempted to vote -1 based on DERBY-5430. The 10.8.2 release candidates >> produce a deadlock in NsTest. That deadlock was not seen in 10.8.1 or >> earlier releases. > > If we had a reproducible case for DERBY-5430 I would agree, then we could at > the very worst case binary search for the change in 10.8 that > caused the issue. I've tried this but failed and see very inconsistent > results using nstest. On exactly same codeline/machine/environment it > will pop after 1 hour and then not after days. I have also reviewed all > the changes in 10.8 since the previous release and can not come up with > anything that looks likely to cause this kind of problem. >> >> However, I do not have any confidence in NsTest as a release barrier. This >> test suffers from a number of defects which severely cripple its usefulness: >> >> 1) No-one seems to understand this test. >> >> 2) The test is not being run in its preferred configuration. The "Ns" in >> NsTest means "Network Server" I think, but as far as I can see the test is >> only being run embedded. > > I was around when this test was being developed. Originally I believe we > were looking for a network specific test to add to embedded stress tests we > had. But when we looked at what resulted there was nothing > network specific about it, and in fact was found to be more stressful > run in embedded mode. I agree if we had the resources we should run it > in both modes (and maybe even alter its various parameters to change > what it stresses). For instance I think it currently also only runs > on encryped databases and thus does not stress other more "normal" paths. >> >> 3) The test produces reams of errors. I don't think we know how to strain >> signal out of this noise. The sheer volume of errors suggests that the test >> is badly written and that it does not model a sensible workload. > > I go back and forth on this. As a developer I believe if I wrote this > test I would not have it act this way. But one original objective of the > stress test was to stress unexpected paths not being tested by others. >> >> 4) The person who runs this test (Myrna) has lost confidence in its >> ability to disclose regressions, as evidenced by the downgrading of the >> urgency of DERBY-5430. >> >> I do not think that we should use NsTest as a release barrier again until >> we address its defects. > > I think release managers should look at the result of this test and make > their own determination. If many ASSERTS or other system errors (like > DERBY-5422) or server crashes start coming from this test then it is giving > good feedback. We would not have seen DERBY-5423 without this test, and I > believe that would have been a severe problem for existing > user applications. > > So I agree that nstest failing should not necessarily mean a release should > be blocked. Unfortuntately it results need to be interpreted and > a decision made by the community/release manager on if it should be block or > not. It has shown up real bugs in the past that all other > tests have missed so don't want to throw it out. It is to bad that it's > signal to noise ratio is so large. >> >> Thanks, >> -Rick >> >> >> > >
I'm voting +1 to release 10.8.2.2. I confirm that I did see the deadlock of DERBY-5430 with 10.8.2.2 - so even after Rick's backing out the fix for DERBY-4377. I thought Rick had also seen this in a build off the branch after the backing out? Perhaps I misread the comments in DERBY-5430. I decided to lower the priority of DERBY-5430 for 2 reasons: - nstest is not a very consistent test for finding this issue. I can only state that I've *not* seen DERBY-5430 in release cycles before 10.8.2.2. (at least not 10.8.2.1 nor 10.7.1.1), which doesn't mean it didn't exist. (As an aside, note that I also did not see DERBY-5454 again with 10.8.2.2 (deadlock on select max) which I had expected to see...) - a number of people have looked through all the changes and stated none of them appear obvious for causing this issue. After finishing up the release work, I will take some time to go on a binary search and see if I can find if there was a check-in which caused nstest with embedded to see this (/see this more easily, assuming it existed before.) But this will be a very slow process, might be a month or more. Re nstest with Embedded vs. Network Server - for the past 4 releases or so I've run nstest both ways - the embedded configuration on windows and the network server configuration on a linux machine. I've consistently logged the results on the platform testing page. The Network Server test didn't show deadlocks, which is clearly stated in DERBY-5430. Myrna
