On Fri, Feb 03, 2006 at 11:53:01AM -0500, Nima wrote: > What basically happens is that once I have more than 150 users > logged in the response time for a page takes a minute or more which > is very frustrating for users. Today we received an email saying > that the system is simply "shit". I don't know what to do next. All > I know is that from semester to semester the load is getting higher > but the frustration as well.
Nima, rrom your description this did it NOT happen recently all at once, it has been gradually getting worse as you add more users, right? It sounds like you're fairly lost and that you have an urgent performance problem affecting your users. I don't know just what dotLRN installation this is, but if you really have 150 "concurrent" users (whatever that means precisely in this case), then it's probably one of the big dotLRN installations at a large university somewhere (ah, from the config file below, "uni-mannheim.de"). I hope that means you also have some sort of support contract with one of the OpenACS / dotLRN gurus. If it's not something you can immediately (like, today) identify and fix, my advice is get them involved ASAP. It might be just a simple misconfiguration somewhere, or it might be something deeper and trickier to fix. Either way, having all your users actively angry at you makes it plenty urgent enough to call in the big guns... > We have three linux boxes. One for an aolserver with database connection, > one for a static aolserver and one for the database. > > The database box never goes above 5-10%. The static server is also not That's only cpu load. You also want to check it's I/O activity. (Solaris top also shows "I/O wait" percentages but Linux unfortunately does not.) On older Linux boxes "iostat 5" was the way to do that. Newer Linux systems may have different/better ways to do that. > very busy but the dynamic server can go upt to 99% and a load of 10 and > more. Well, that VERY strongly suggests that the rate limiter is simply executing all that Tcl code in your AOLserver. If so, add more AOLserver boxes, and set up Pound or the like as a front-end server to split the load between them. And/or upgrade to a much faster server. In addition, try to find out what pages are eating up most of the processing time, and speed them up. A lot of that processing may be redundant and/or innefficient. Some judicious cacheing and/or code tuning could make a huge difference. Oh, and a silly question: What version of Tcl are you using, and did you compile it with optimization? You definitely want to be using the latest Tcl 8.4.x version compiled with either "-g -O2" (my preferecne) or "-O2". I don't know how much slower Tcl is if you leave compiler optimization turned off, but it's probably enough to be very noticable in your case. (Make sure AOLserver was also compiled with optization of course; it re-uses the Tcl build flags.) Finally, this is more of a research project, but your site is large and busy enough to benefit from figuring out just what the current status is of this patch: Cache compiled Tcl page bytecode http://sourceforge.net/tracker/?func=detail&aid=689515&group_id=3152&atid=353152 > Currently: > %MEM %CPU SHR PID USER PR NI VIRT RES S TIME+ COMMAND > 41.1 0.0 7448 27147 unima2 25 0 1770m 1.6g S > 0:36.66 /opt/aolserver4/bin/nsd -u unima2 -t /www/unima2/etc/config.tcl > with 44 users logged into the system. That says that your AOLserver is using 1.7 GB total memory, almost all of it resident. Which is huge for most people, but probably quite reasonable for you, since you have 4 GB in that box. That at least probably means that the box isn't thrashing between RAM and disk, good. > dotlrn (dynamic server) > > AOLServer 4.0.10 (connected to the database) > Pound 1.8.2 (as reverse proxy for ssl and load balancing) > Apache 2.0.53 (only redirect from 80 to 443 where pound is) Oh, you're already using Pound as the front-end. So, shouldn't it be easy to stick in additional AOLservers behind it for dynamic content? The CATCH is, is your site and all its code, both stock and custom, already set up to work nicely with multiple AOLservers? Or does it rashly ASSUME only 1 AOLserver process in some places, such that you are going to see bugs or inconsistencies when using multiple AOLservers? I dunno. For that, I definitely recommend talking to the other folks running multiple AOLservers with OpenACS and dotLRN. It sounds like you're running Pound on the same box as AOLserver. You'll definitely need to change that in order to add dynamic content servers. (I don't understand why you're using Apache to redirect client browsers from port 80 to 443 either, that seems odd.) > SuSE 9.2 > Linux Linux version 2.6.8-24.18-smp (gcc version 3.3.4 (pre 3.3.5 > 20040809)) #1 SMP Fri Aug 19 11:56:28 UTC 2005 > 4 CPU Intel(R) Xeon(TM) CPU 3.06GHz , L2 cache: 512K > 4 GByte RAM - Memory: 4070968k/4111296k available (2339k kernel code, > 39528k reserved, 824k data, 252k init, 3193792k highmem) > 2 Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet Cards Hm, do you REALLY have 4 Xeon CPUs in that box, or is that the Intel hyper-threading feature turned on? I suspect you have 2 Xeon single-core sockets, with hyper-threading turned on. Back with Linux 2.4.x the ule of was to always turn hyper-threading OFF, as Linux didn't know how to use it properly and hyper-threading would slow things down, not speed them up. I don't know whether that has changed with the newer 2.6.x kernels. I was going to suggest that if your single AOLserver box is a few years old, then immediately replacing it with a (much faster) brand spanking new one may be the easiest and most cost effective way to alleviate the problem. Then you can take more time to get multiple boxes set up, make sure that your code works correctly in that configuration, etc. However, from the specs above you're already using a fairly high end machine, so that might not make sense. Keeping your existing box and adding more is probably the way to go. For comparison though, for about $5900 US, right now you could order a PowerEdge 1850 1U box from Dell with 2 sockets each with a dual-core 2.8 GHz Xeon (2x2 MB L2 cache), 8 GB of RAM (expandable to 16 GB), RAID-1 w/ 2 15k rpm SCSI drives. That should give you roughly 2x the performance of your current box. Or the same Dell box with 2 single-core 3.8 GHz Xeons for about $5600. I wouldn't necessarily pick either that machine or Dell, but that's a useful price point for comparison. The Dell box is mildly gold-plated anyway, since this is effecively a compute box, it is running only AOLserver no RDBMS or anything else disk intensive, I would probably go with hardware RAID-1 but with SATA or even plain old IDE drives, no need to pay extra for SCSI. If I had many of these identical boxes and was set up to easily automatically install them, I might even skip the RAID card. -- Andrew Piskorski <[EMAIL PROTECTED]> http://www.piskorski.com/ -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.