Berin Lautenbach wrote:

Raul,

I've added one or two comments below, but at the end of the day - what interests you most? There are plenty of things that need to be done (you've mentioned some below). Anything you do will be highly valued, so go with what interests you most :>.

And if you can keep improving the performance, you will make lots of people very happy :>.

Ok, I will try to keep focus in the performance and memory issues. And I will try to do my best. But we are aproaching to the limit in some cases(the engineSign is the floor, and now is the 60% of the sign from 20% that was at the beginig). Perhaps in the memory handling can we improve the things a little ;)

1. Refactor the org.apache.xml.security.Init: Right now is very slow in what it does. The use of XpathApi makes it very slow, I can rewrite it to use DOM, and use internal structure objects to reduce the memory footprint. Also it can be good to have another method of initialization (like Init.initGreedy() ) that do the expensive steps so it doesn't need to happened at execution time. It can also be correct so it compile safely with the current CVS version of xalan (and get rid of the integration failed messages in the maillist).


I'd like to get rid of this blasted message - the problem is it will make the detection of the original 1.4 JDK (with the "bad" version of Xalan) fail. Erwin and I discussed it on the list some time back, and my feeling was "do it", but I'm not sure how comfortable Erwin was.

I have send a patch to the mailing list some days ago that fix the issue with all the xalans. It uses reflection to access to the private field. So it always compile but in case it is private it silently fails and keep the other initializations.

2. Reduce memory footprint of the library: It seems that the big xml files or not well handled by the current version. During my refactoring of c14n I see that the c14n bytes are copy back and forth (for example for digesting a byte array is copy 3 times before the final digest). This in big documents can be expensive in terms of memory. A refactoring avoiding these copies can be done easily.


Yup. In fact, it should be possible to do some pipelining. The C++ library uses "transformers" which are connected to each other in series. Each reads from the previous as it goes. So canonicalisation is done in steps (generally 2K) rather than all at once. The problem is you still need the DOM document in memory - but beggers can't be choosers :>.

That's a great idea. Perhaps we can create a DigestWritter or SignWritter that has an internal buffer and when this buffer is fill it invokes the update method of the algorithm with the contents so far, then reset the buffer and keep working. This way the memory consumed it's always constant and big documents can be signed. I think is a possible refactor that doesn't impact in a lot of things(all the c14n implementation needs only to change 10 lines).
Perhaps this could be my first step.



Regards,

Raul Benito

Reply via email to