Hi Francois, Thanks for these interesting observations! Detailed responses inline.
- Dennis Dennis M. Sosnoski SOA and Web Services in Java Training and Consulting http://www.sosnoski.com - http://www.sosnoski.co.nz Seattle, WA +1-425-939-0576 - Wellington, NZ +64-4-298-6117 Francois Valdy wrote: > Hi, > > Performance of MarshallingContext and its unmarshalling friend are > really poor compared to the effort done on the rest of JIBX. > It's not noticeable for large objects, but for small ones, between 50% > and 75% of the marsh/unmarsh time is taken by those classes. > I think it's better to optimize for large objects/documents rather than small ones, but agree that ideally we'll have great performance for both. > Marshalling: > loadClass result should be cached in an array inside the factory > (shared cache between MarshallingContext). > This would involve using synchronization, which can be a real issue in multithreading systems (especially multiprocessor ones). Still, there'd also have to be synchronization at some level within the classloader checking loaded classes. It'd be interesting to check the actual performance tradeoffs on this. Alternatively, it'd be possible for the factory to create the array of classes in its initialization. That way synchronization would not be needed for accessing the array - but it does mean there'd be a considerably startup overhead to load *all* the marshaller/unmarshaller classes defined in the binding, even though most of them may never be used for the actual documents in use. I suppose it would be possible to use a binding definition flag to say you want to preload the classes, so that users could control the behavior. The worst case for the current code is going to be when the first attempt to load the class fails. This throws an exception, due to the idiotic hold-over from when there was only one classloader, and the exception will be a severe performance hit. Do you know if this is what's happening in your tests? There is a potential way around the exception, I think, which is to use getResource() first to try to find the class file, and only load the class when the class file is found. I'm not sure that this would really provide any benefit, though. > I've updated the binding generation to add this array of null to the > factory, passed to the MashallingContext constructor (support null for > backward compat). > Class object is cached in factory only if loaded from the factory classloader. > Result being a 50% performance increase for small objects. > Are you using synchronization for access? And if so, have you tried it with multithreading/multiprocessors? > Unmarshalling (improvement from marshalling above applies too): > for small objects unmarshalled from big factories, the time taken to > build the cache map is really BIG (and useless). > If you know the type of object to be unmarshalled you should be able to avoid this overhead completely by instead using an instance of the object cast to IUnmarshallable, calling the unmarshal() method. But this approach is certainly not encouraged by the code samples and such, and probably wouldn't occur to users. There's one obvious optimization that could help with very large mappings, which is passing the size to the HashMap constructor. I've made that change in my code, but you may want to try it out to see if it makes any significant difference for your case. > If you supposed that all node names are intern'ed String (all litteral > Strings are, and programmatically added ones can be intern'ed), then > it's much more efficient to search the array directly with == > comparison (and not .equals) after calling intern() on the given > string (which is probably already intern'ed by xpp). > In fact even for large objects in large factories, it's still more > efficient (we're talking about a hashmap of arraylists/integer built > every time I unmarshal a single object here). > > Let me know your thoughts about those, and I'll gladly share them with > the community. > I don't think intern'ing can be assumed, since not all parsers (let alone other possible sources of document data) do this. It would probably make sense to just always create the map during the initialization of the binding factory, though, and pass it to the unmarshalling context. That way it's just a one-time overhead. However, JiBX currently allows nested <mapping> definitions, where one is only defined in the context of another - and that means the map is unmarshalling state dependent. For 2.0 I plan to eliminate this nested mapping feature completely, since it's never really been of much use. > My next move will be to cache the (un)marshallers themselves (maybe > not reset them in context reset). > I know that's dangerous for external (custom) (un)marshallers (even if > I don't use any that can't be re-used), so I'll try to find a solution > for that too (maybe a two layer cache, one for JIBX generated ones, > re-usables, and one (lazily built and reset) for others). > Sounds interesting, and please let us know what you find. > That's my 2 cents on JIBX, which is already a great framework. > > Thanks for reading. > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > jibx-users mailing list > jibx-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/jibx-users > > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ jibx-users mailing list jibx-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jibx-users