On 22 May 2004, at 09:13, Ugo Cei wrote:
Il giorno 21/mag/04, alle 18:11, Pier Fumagalli ha scritto:
From this morning at 8:32 AM (BST) http://www.vnunet.com/ is running off a standard 2.1.5 (head) distribution of Cocoon, Apache 2.0.49 w/ mod_cache, Jetty 4.2.19, and a hint of my take on the Cocoon kernel empowering the backend XML data repository...
Congratulations! Can you give us some more details? How many pages are you serving daily and on which hardware, for instance? I think success stories like yours are important to demonstrate that Cocoon is able to serve lots of content with good performance.
Well, let's say that Cocoon is most definitely NOT the "performant" component on the site...
The pages are generated going through something like 2 megs of aggregated XML documents, and given the structure of the site (and the fact that we're still not 100% confident) we're using non-caching pipelines...
In other words, it takes us roughly between 1 and 2 seconds to generate one single HTML page (whoha, bessie)...
But it's all cached on the front end by Apache's mod_disk_cache, so, in terms of performance, we don't seem to hit major problems.
And seriously, we don't care much "how long" it takes to create a page... We're a news site, so the variation on URLs requested in a day is not much (currently my cache is filled up with something like 2000 documents, even if you can have access to almost 100k articles on the site).
And the architecture (with caching up front powered directly by Apache) allows us to withstand "slashdot-like" attacks very easily (the first one coming in generates the request, all the remaining freaks get the copy cached off on the disk)...
It was a weird change from JSPs because those were never cached, and we had to put a lot of effort in actually making the JSP engine and code "fast"... With Cocoon, well, we know we wouldn't have been able to, so we thought out other ways to deal with it, and (more importantly) it forced us to think to a better and more scalable architecture...
One example above all: advertisement tags... Before, a lot of the advertisement code was generated on the server on a PER REQUEST basis... Now, we can't do this anymore because of the load that that would put on our server, so, we had to re-engineer how to serve ads, relying (for instance) more on the client javascript engine... But the knowledge that _we_can_not_ pass through every single request to Cocoon, helped us in the sense that id made us aware of all those problems that (for instance) forbid us to deploy the same application on several different machines at the same time (so, no fault tolerance, no load balancing, no nothing)...
Now, the AMAZING thing, was the SPEED at which the site was developed... Three weeks for the whole shabang...
Do that with JSP, yeah, right! :-)
The severe and "restrictive" contracts that cocoon imposes to the users of its services might seem harsh at first (the, how do I do this, Cocoon doesn't do that syndrome was felt quite strongly at the beginning), but on the other hand, it forced us to _THINK_... To think about what we wanted our website to do, and how one single aspecto of it related with the rest of the site. Yes, we wrote some small hacks, or shortcuts, but amazingly enough, after the first 1 and a half weeks spent by Jerm getting all the information sorted out (with nothing moving forward and my manager freaking out), the rest of the functionality came out in the remaining two.... And we have a TON of pages up there...
It proved me (to my managers, and to the rest of the team) that limitations in contracts, and clear defined rules and boundaries out of which you cannot go to, even if they MIGHT seem counterproductive at first are clearly an advantage in developing and managing complex project...
In terms of what you ask about performance and so on, I still don't have many figures but what I mentioned above... I know for sure that there's a HELL-OF-A-LOT that we can (and we will) improve, for now, we decided that no matter what, we had enough hardware to throw at the baby to match any possible requirement..
We started off thinking about 4 machines (HP/380s running Linux w/ 2 Gigs o' ram and 2 3.2 gigs procs each, in other words, big stuff)... We already scaled down on only two of those (and we kept two not for performance, but for failover)...
In the future I think that we're going to use all four of them (once all the sites we host will be moved to Cocoon), but maybe separate out the hardware on classes of functionality (two for serving/caching, two for generating content), but we'll see how the baby adapts and how it behaves over the next few weeks...
For now I'm happy that it works, it works better than expected, and that the concepts behind the machinery are stronger than any possible performance hack you can possibly think of: if you need speed, even if Cocoon is not _THAT_ fast, you can get it to serve the heck out of it anyhow. You only need to _THINK_ about your problems and not rely on some magic software to magically run your badly-designed web-application fast enough! :-P
Pier
smime.p7s
Description: S/MIME cryptographic signature
