Re: Geronimo Deployment Descriptors -- and premature optimisation

Dain Sundstrom 9 Sep 2003 20:23:22 -0000

Alex in an embedded system you may only have a small prom available to boot from. Specifically, you may want to ditch the 1 meg for an XML parser. You may also have a small amount of memory and XML parsers are known for being memory pigs (which is fine in a normal server).

Also we need to be open to other persistent forms. The XML document is simply a persistent form of data and we need to be open to other persistent forms.

-dain

On Tuesday, September 9, 2003, at 01:53 PM, Alex Blewitt wrote:

On Tuesday, Sep 9, 2003, at 17:44 Europe/London, Jeremy Boynes wrote:
However, it doesn't necessarily mean that it can't generate the XML, rather than a binary-compatible format that Jeremy was suggesting. An XML document will always be more portable between versions than a generated bunch of code, because when bugs are fixed in the latter you have to regenerate, whereas with an XML file you don't.
Please do not think I am thinking binary is the only way to go - that notion was discarded back in EJB1.0 days. What I want is to have it as an option.
Can I make a few observations here:
o Assumption: large XML files take long time to parse, therefore the server will be slow to start up o Assumption: the way to solve that is with the deploy tool, and possibly a combined XML+binary format.

I think there are other solutions to the problem than just these. Whilst it is true the XML file parsing can take some time, it's not actually likely to be where the amount of time is taken up in the server. If we had metrics to prove it, I'd shut up, but we don't.

I'd postulate that we would be able to fire up the server faster if we used a different optimisations; for example, a multi-threaded startup (like provided by Avalon) instead of a single threaded model; an on-the-fly parse of the XML file instead of into a DOM/POJO; ditching the JMX later and using Java method calls; and so on.

But we don't *know* that this is where the bottleneck is. It may be, and we can run tests to show that in a simple scenario, option A is faster than option B, but that doesn't mean that that's where the bottleneck will be in the server.

But if it takes (say) 10 or 100 times as long to dynamically create the bean, we are solving the wrong problem. Don't get me wrong, I don't know how much time it takes to create a bean -- but we don't seem to have any profiling to suggest the various options. It could even be the case that a more optimised XML parser would solve the problem, or a different way of creating the POJOs.

I'd also like to disagree that this optimisation should be done by the deployer. Why not have it done by the server when the code is deployed? Sure, you wouldn't want it to happen every time the server starts (like compiling JSPs) -- so dump out a binary representation at the server side, and drop that cache when the application gets redeployed. That way, you still get the fast startup (2nd time onwards) whilst maintaining portability and without having to sacrifice any issues with the developer.

For example, parsing the XML with full schema validation is a dog - on my machine even a simple file takes a couple of seconds and a couple of MB and I am concerned about a) large applications with hundreds of modules taking forever to start, and b) applications trying to run with constrained resources. And yes, we do need to consider these things :-)
But if you had that large an application, how long would you expect it to take up? Realistically, what is the largest size of app you've had to deal with? Most web-apps have just a single servlet these days (ala Struts), so the only issue is with EJBs, and with 1000 EJBs you're still looking at 1k of data/EJB to make a 1MB file. That's a hell of a lot. And do we know how long it takes to deploy 1000 EJBs once the XML file has loaded? Are we seriously saying that we expect that part of the process to take dramatically less than 2s? If not, then the bottleneck isn't going to be at the XML parsing stage.

We have also had proposals for storing configuration information in LDAP respositories and relational databases, neither of which would allow vi-style access to the XML. A binary format may well be a better option for them.
IMHO I don't think that a 'vi' style access for XML is the sole reason to use them. I am personally more a fan of storing the configuration in LDAP, which will be slower still than having it in XML files. But I wanted to raise a big 'no' to a binary file format, including any serialized concepts of MBeans which would then have real difficulty in being interpreted if we ever managed to break away from JMX. No, I don't think it will happen soon, but I can hope :-) See Elliotte's comments on XML and binary at http://www.cafeconleche.org/books/effectivexml/chapters/50.html (or the cached version at http://216.239.41.104/ search?q=cache:oxknzyhXE9MJ:www.cafeconleche.org/books/effectivexml/ chapters/50.html+%22Compress+if+space+is+a+problem%22&hl=en&ie=UTF-8 since I couldnt' see it on the former)

Think of it like JSP: some people want to pre-compile, and this is *very* common in production environments.
I don't see the two being that comparable. A site may have many hundereds of JSPs with several k of data in them each, and they take (relatively speaking) a long time to parse, translate, and then compile. I don't see that parsing an EJB-JAR.xml file in the same order of magnitude.

I don't disagree that we can cache an internal form to optimise speedup; I just don't think it should be anything the deployment tool should use. Same with JSPs; we can upload them into Geronimo, and then a background process can pre-compile them when resources are available. I don't think we should force the developer to decide between the two. [What other JSP engines get wrong is that it's necessary to precompile all JSPs before deployment. It's not; they just need to be compiled before the user sees them. The process should be Deploy -> run app -> precompile all possible next JSPs that you can move to.]
Premature optimisation is the root of all evil.
Alex.


/*************************
 * Dain Sundstrom
 * Partner
 * Core Developers Network
 *************************/

Re: Geronimo Deployment Descriptors -- and premature optimisation

Reply via email to