The first bit I'd like to say is that the use of couchjs was just a stop gap measure to get the test suite out of the browser. We used to have to deal with so many browser issues it was just a terrible mess. The issue with couchjs is much as you've seen that its not a very full environment for writing tests. So just to be clear that the only real thing tying us to that as a test platform is that we have a large amount of JS written already so either we need to make the couchjs better, use node, or translate tests to something that has a more useful environment.
I've been noodling over whether we might be better off to just start translating everything to Python or something. I've seen suggestions for Erlang but I personally think Erlang is a terrible language for writing tests like this (specifically, the code to test ratio is ungood). If we had something like Python to hack on then I was also thinking of writing a library function that would start CouchDB as a slave process which then would remove the need to have the _restart handler because you could just kill -9 the subprocess and restart it with maybe a wait for when things boot again. I reviewed your feature branch the other day and I'm +1 for pushing that to master. Awesome work, Wendall. On Fri, Apr 5, 2013 at 6:57 PM, Wendall Cada <[email protected]> wrote: > I wanted to follow up on this. > > I've created a feature branch for this and a JIRA issue > https://issues.apache.org/jira/browse/COUCHDB-1762 > > Overall, I think the worst problem is that the tests really aren't > debuggable in any sane way, and logging is essentially useless for most > things. The only sure way to spot an error most of the time is if it's an > actual CouchDB bug and shows up in the log. I'm not sure how this can ever > be fixed with the current test suite. I'd opt for testing with jasmine, but > that would require not using couchjs for the test runner, so for now, I just > focused on getting random failures under control. > > Paul was kind enough to share some code that he wrote recently to deal with > the rampant _restart issues. > https://github.com/davisp/couchdb/commit/0cbf6a9cea01eea599524bcdb77dedb322c7ade4 > This is a very sound approach in using a token so you can see if it actually > restarts. The current test suite can result in false positives very easily, > which leads to test failures. I think this is probably the biggest reason > for the random failures. In a previous IRC conversation with Bob (rnewson), > Jan and I think Benoit (sorry if not the case) _restart was deemed something > that should go away. I filed a ticket for it's removal > https://issues.apache.org/jira/browse/COUCHDB-1714, and as Bob points out in > the comments, this is useful for the test suite. I'd argue it's only useful > with Paul's patch adding a token. Otherwise, it's just not reliable at all. > > For the branch I created, instead of using _restart, I did some bash magic > with a pipe and stop/start the process through the local run script. This > has the same drawback of not knowing if CouchDB restarted, or we just got a > false positive. To account for this, I put a small delay in the execution of > the lookup, using a new method isRunning to give a little time to stop. > > I also changed the suite to run a new couchjs for each test file. I'm not > certain at this point that this is even necessary at all, but I still think > it's safer in case of a crash, since the rest of the suite can continue. > > Other changes I made were just timing related in running the test suite for > spinning disks, and a couple bug fixes in individual tests. > > The lack of timers makes writing these tests very ugly. I really dislike > this, but so long as the test suite needs couchjs, I don't see a way to > avoid this without implementing our own setInterval method in C. > > One last item. I was getting a consistent failure in Centos 6. I tracked > this down to a bug in libcurl. For some reason, after any xhr request that > returns a 416, the very next send() will hang for a long time, then > eventually crash couchjs. The specific version causing the issue is > curl-7.19.7-35.el6 and libcurl-7.19.7-35.el6. I'm not certain if this is > worth reporting in JIRA, but it will certainly cause a test suite failure > consistently in attachment_ranges, but otherwise appears to be fairly > harmless. Maybe this should be documented somewhere? > > Wendall > > > On 03/27/2013 02:05 PM, Wendall Cada wrote: >> >> In 1.3.0, there is a new part of the test suite to run the javascript >> tests from the command line. I'm running into various issues on different >> hardware/OS configurations. Mostly, tests hanging or timing out and failing. >> These are really hard to troubleshoot, as they all pass just fine if run >> individually. >> >> What I'm experimenting with today is rewriting how the tests are >> implemented to be run one at a time from a loop in bash, versus a loop in >> javascript. I think the failures I'm running into are improper >> setup/teardown. There may be an issue with rapid delete and adding a db, or >> rapidly starting and stopping couchdb, but I think this is not what's >> happening in my failures. >> >> The nature of spidermonkey doesn't allow for spawning threads, or >> sandboxing, etc, so it's hard looking at the test suite to see how I can >> improve running all tests. I think it's far better to have the setup spawn a >> new interpreter for each test. Tear down will kill the interpreter. >> >> Wendall > >
