You might recall my building a tool for online testing with Iliad and GNU Smalltalk. Today was "the" day, and here is what happened.
I had gst running on one of my webservers, students accessed it through an apache proxy set up to serve static files and forward the remaining requests to the app. As testers we had 5 classes with about 25 students each. The test they took had 13 exercises with a total of 95 questions. The clients were about 25 PCs in a private subnet, going through a NAT-ing DSL router with something between 200 kbit/s and 500 kbit/s upstream and 2 Mbit/s and 6 Mbit/s downstream (IIRC), shared with a few other users across the rest of the school grounds. Response time was snappy, iliad's "ajax loader" mostly only flashed in the corner, without a chance of showing off the animation. From the users' point of view, it was a nice change of things, and they quite liked it, once we got over the initial hurdle (below). I watched the app through atop and paid attention to the RSIZE, i.e. the amount of physical RAM the gst-remote process was using 1) after startup, before any access: 19 MB 2) after displaying the registration page with a single widget for 25 clients: 33 MB 3) after everybody started working, i.e. with 25 full blown test widgets: 65 MB 4) after finishing the test: between 100 MB and 110 MB I have saved the images of every run gst-remote --eval="ObjectMemory snapshot" and will try to find out what has caused this growth. CPU load on the server varied upto 90% during the initial phase (more below) and upto 30% during test execution. The numbers are not really reliable, as the machine is hosting a few other apps, which might have contributed. The machine is a "small" single core 64 bit AMD64 3700+ with 1GB RAM. Now for the interesting problem, that managed to make me a bit nervous... and would have been a total showstopper, had it not happened in this "experimental testing session". The students of class 6a logged into their Windows domain accounts, started Firefox and entered the URL for the test (stage 1 above). Then they entered their names into the registration page (stage 2) and clicked on the button to access the test. Shortly after server CPU load went to 100% with the following error message being repeated as fast as the remote terminal could cope with: "Socket accept error: Error while trying to accept a socket connection" Client side a one-liner 500 error message was reported. Time for pkill gst-remote ... I rebuilt the image and started the server again. This time we staged the 25 "almost simultaneous" login attempts into four batches of 6 each and things worked fine from that point on. After finishing the test, the students logged off and the next class, 6b ... had the exact same experience ... and 6c and 6d, too. For the final group I tried a different approach: They logged on, opened the URL, and sat on their hands. I killed gst-remote, rebuilt the image, restarted gst-remote and told them to reload the page. They then entered their names and started clicking on the answers and the Socket error of Doom appeared again. Kill, rebuild, restart. Everybody loads the registration page (not staged, just 25 students clicking when they're ready), enters their name and works on the test as it should be. No hiccup. I am very open to any suggestions as to what could have caused this misbehavior. I don't think iliad is concerned (besides generating ajax requests), so swazoo and the gst socket implementation are my next suspects. My vague-feeling-in-the-gut-proved-by-handwaving hypothesis is that it might be the combination of building a fairly large widget tree *and* creating a bunch of new socket connections at the same time that's causing the trouble. I'll try to build a test bed to reproduce the disaster in a controlled setting, but it will be a few days before I can really get to this. Any ideas? s. _______________________________________________ help-smalltalk mailing list [email protected] http://lists.gnu.org/mailman/listinfo/help-smalltalk
