Hello people, I'm currently at Giacomo's place and we spent a rainy afternoon profiling the latest Cocoon to see if there is something we could fix/improve/blah-blah.
WARNING: this is *by no means* a scientific report. But we have tried to be as informative as possible for developers. We were running Tomcat 4.1.10 + Cocoon HEAD on Sun JDK 1.4.1-b21 on linux, instrumented with Borland OptimizeIt 4.2. Here is what we discovered: 1) Regarding memory leaks, Cocoon seems absolutely clean (for cocoon, we mean org.apache.cocoon.* classes). Avalon seems to be clean as well. Good job everyone. 2) we noticed an incredible use of org.apache.avalon.excalibur.collections.BucketMap$Node. It is *by far* the most used class in the heap. More than Strings, byte[], char[] and int[]. Some 140000 instances of that class. The number of bucketmap nodes grows linearly with the amount of different pages accessed (as they are fed into the cache), but even a cached resource creates some 44 new nodes, which are later garbage collected. 44 is nothing compared to 140000, but still something to investigate. So, discovery #1: BucketMaps are used *a lot*. Be aware of this. 3) Catalina seems to be spending 10% of the pipeline time. Having extensively profiled and carefully optimized a servlet engine (JServ) I can tell you that this is *WAY* too much. Catalina doesn't seem like the best choice to run a loaded servlet-based site (contact [EMAIL PROTECTED] if you want to do something about it: he's working on Jerry, a super-light servlet engine based on native APR and targetted expecially for Apache 2.0) 4) java IO takes something from 20% to 35% of the entire request time (reading and writing from the socket). This could well be a problem with the instrumented JVM since I don't think the JDK 1.4 is that slow on IO (expecially using the new NIO facilities internally) 5) most of the time is spent on: a) XSLT processing (and we knew that) b) DTD parsing (and that was surprise for me!) Yeah, DTD parsing. No, not for validation, but for entity resolution. It seems that even if the parser is non-validated, the DTD is fully parsed anyway just to do entity evalutation. So, discovery #2: Be careful about DTDs even if the parser is not validating. Of course, when the cache kicks in and the cached document is read directly from the compiled SAX events, we have an incredible speed improvement (also because entities are already resolved and hardwired). 6) Xalan incremental seems to be a little slower than regular Xalan, but on multiprocessing machines this might not be the case [Xalan uses two threads for incremental processing] NOTE: Xalan doesn't pool threads when it does that! So, while perceived performance is better for Xalan in incremental mode, the overall load of the machine is reduced if Xalan is used normally. 7) XSLTC *IS* blazingly fast compared to Xalan and is much less resource intensive. Discovery #3: use XSLTC as much as possible! NOTE: our current root sitemap.xmap indicates that XSLTC is default XSLT engine for Cocoon 2.1, but the fact is that the XSLTC factory is commented out, resulting in running Xalan. We should either remove that comment or uncomment the XSLTC factory. I vote for making XSLTC default even if this generates a few bug reports. 8) Cocoon's hotspot is.... drum roll.... URI matching. TreeProcessor is complex and adds lots of complexity to the call stacks, but it seems to be very lightweight. It's URI matching that is the thing that needs more work performance-wise. Don't get me wrong, my numbers indicate that URI matching takes for 3% to 8% of response time. Compared to the rest is nothing, but since this is the only thing we are in total control, this is where we should concentrate profiling efforts. Ok, that's it. Enough for a rainy swiss afternoon. Anyway, Cocoon is pretty optimized for what we could see. So let's be happy about it. -- Stefano Mazzocchi <[EMAIL PROTECTED]> -------------------------------------------------------------------- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]