Re: [jibx-users] (Un)MarshallingContext bad performance

Dennis Sosnoski Tue, 28 Aug 2007 22:28:06 -0700

Hi Francois,

Thanks for these interesting observations! Detailed responses inline.

  - Dennis

Dennis M. Sosnoski
SOA and Web Services in Java
Training and Consulting
http://www.sosnoski.com - http://www.sosnoski.co.nz
Seattle, WA +1-425-939-0576 - Wellington, NZ +64-4-298-6117

Francois Valdy wrote:
> Hi,
>
> Performance of MarshallingContext and its unmarshalling friend are
> really poor compared to the effort done on the rest of JIBX.
> It's not noticeable for large objects, but for small ones, between 50%
> and 75% of the marsh/unmarsh time is taken by those classes.
>   

I think it's better to optimize for large objects/documents rather than
small ones, but agree that ideally we'll have great performance for both.

> Marshalling:
> loadClass result should be cached in an array inside the factory
> (shared cache between MarshallingContext).
>   

This would involve using synchronization, which can be a real issue in
multithreading systems (especially multiprocessor ones). Still, there'd
also have to be synchronization at some level within the classloader
checking loaded classes. It'd be interesting to check the actual
performance tradeoffs on this.

Alternatively, it'd be possible for the factory to create the array of
classes in its initialization. That way synchronization would not be
needed for accessing the array - but it does mean there'd be a
considerably startup overhead to load *all* the marshaller/unmarshaller
classes defined in the binding, even though most of them may never be
used for the actual documents in use. I suppose it would be possible to
use a binding definition flag to say you want to preload the classes, so
that users could control the behavior.

The worst case for the current code is going to be when the first
attempt to load the class fails. This throws an exception, due to the
idiotic hold-over from when there was only one classloader, and the
exception will be a severe performance hit. Do you know if this is
what's happening in your tests?

There is a potential way around the exception, I think, which is to use
getResource() first to try to find the class file, and only load the
class when the class file is found. I'm not sure that this would really
provide any benefit, though.

> I've updated the binding generation to add this array of null to the
> factory, passed to the MashallingContext constructor (support null for
> backward compat).
> Class object is cached in factory only if loaded from the factory classloader.
> Result being a 50% performance increase for small objects.
>   

Are you using synchronization for access? And if so, have you tried it
with multithreading/multiprocessors?

> Unmarshalling (improvement from marshalling above applies too):
> for small objects unmarshalled from big factories, the time taken to
> build the cache map is really BIG (and useless).
>   

If you know the type of object to be unmarshalled you should be able to
avoid this overhead completely by instead using an instance of the
object cast to IUnmarshallable, calling the unmarshal() method. But this
approach is certainly not encouraged by the code samples and such, and
probably wouldn't occur to users.

There's one obvious optimization that could help with very large
mappings, which is passing the size to the HashMap constructor. I've
made that change in my code, but you may want to try it out to see if it
makes any significant difference for your case.

> If you supposed that all node names are intern'ed String (all litteral
> Strings are, and programmatically added ones can be intern'ed), then
> it's much more efficient to search the array directly with ==
> comparison (and not .equals) after calling intern() on the given
> string (which is probably already intern'ed by xpp).
> In fact even for large objects in large factories, it's still more
> efficient (we're talking about a hashmap of arraylists/integer built
> every time I unmarshal a single object here).
>
> Let me know your thoughts about those, and I'll gladly share them with
> the community.
>   

I don't think intern'ing can be assumed, since not all parsers (let
alone other possible sources of document data) do this. It would
probably make sense to just always create the map during the
initialization of the binding factory, though, and pass it to the
unmarshalling context. That way it's just a one-time overhead. However,
JiBX currently allows nested <mapping> definitions, where one is only
defined in the context of another - and that means the map is
unmarshalling state dependent. For 2.0 I plan to eliminate this nested
mapping feature completely, since it's never really been of much use.

> My next move will be to cache the (un)marshallers themselves (maybe
> not reset them in context reset).
> I know that's dangerous for external (custom) (un)marshallers (even if
> I don't use any that can't be re-used), so I'll try to find a solution
> for that too (maybe a two layer cache, one for JIBX generated ones,
> re-usables, and one (lazily built and reset) for others).
>   

Sounds interesting, and please let us know what you find.

> That's my 2 cents on JIBX, which is already a great framework.
>
> Thanks for reading.
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >>  http://get.splunk.com/
> _______________________________________________
> jibx-users mailing list
> jibx-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/jibx-users
>
>   

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
jibx-users mailing list
jibx-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jibx-users

Re: [jibx-users] (Un)MarshallingContext bad performance

Reply via email to