Re: [jibx-users] (Un)MarshallingContext bad performance

Francois Valdy Wed, 29 Aug 2007 00:03:27 -0700

Hi Dennis,

Same here, feedback inlined ;-)
Thanks your your answer.


Francois.

On 8/29/07, Dennis Sosnoski <[EMAIL PROTECTED]> wrote:
> Hi Francois,
>
> Thanks for these interesting observations! Detailed responses inline.
>
>  - Dennis
>
> Dennis M. Sosnoski
> SOA and Web Services in Java
> Training and Consulting
> http://www.sosnoski.com - http://www.sosnoski.co.nz
> Seattle, WA +1-425-939-0576 - Wellington, NZ +64-4-298-6117
>
>
>
> Francois Valdy wrote:
> > Hi,
> >
> > Performance of MarshallingContext and its unmarshalling friend are
> > really poor compared to the effort done on the rest of JIBX.
> > It's not noticeable for large objects, but for small ones, between 50%
> > and 75% of the marsh/unmarsh time is taken by those classes.
> >
>
> I think it's better to optimize for large objects/documents rather than
> small ones, but agree that ideally we'll have great performance for both.
>

I do want both, I have small (20 chars) and large (1Mo) XML objects in
the same application, be sure I checked any tradeoff of my
suggestions.
(except namespaces, which I do not use)

> > Marshalling:
> > loadClass result should be cached in an array inside the factory
> > (shared cache between MarshallingContext).
> >
>
> This would involve using synchronization, which can be a real issue in
> multithreading systems (especially multiprocessor ones). Still, there'd
> also have to be synchronization at some level within the classloader
> checking loaded classes. It'd be interesting to check the actual
> performance tradeoffs on this.
>

This Class object array doesn't need any synchronization, if the Class
is on the index use it, if it's not, look it up. 2 threads
simultaneously setting the value on this cache is not an issue either.
So no added sync here (and you're right saying sync is already at
classloader level, actually I'm avoiding the classloader sync with
this array cache).

> Alternatively, it'd be possible for the factory to create the array of
> classes in its initialization. That way synchronization would not be
> needed for accessing the array - but it does mean there'd be a
> considerably startup overhead to load *all* the marshaller/unmarshaller
> classes defined in the binding, even though most of them may never be
> used for the actual documents in use. I suppose it would be possible to
> use a binding definition flag to say you want to preload the classes, so
> that users could control the behavior.
>
> The worst case for the current code is going to be when the first
> attempt to load the class fails. This throws an exception, due to the
> idiotic hold-over from when there was only one classloader, and the
> exception will be a severe performance hit. Do you know if this is
> what's happening in your tests?
>

The factory does create the array, but only with nulls, that's sufficient.
Moreover, you don't know if the class can be loaded with the factory
classloader (or with any of the two others).
The cache is only used in that first case, but do not prevent backward
compatible usages (although they do not benefit from this cache).

> There is a potential way around the exception, I think, which is to use
> getResource() first to try to find the class file, and only load the
> class when the class file is found. I'm not sure that this would really
> provide any benefit, though.
>
>
> > I've updated the binding generation to add this array of null to the
> > factory, passed to the MashallingContext constructor (support null for
> > backward compat).
> > Class object is cached in factory only if loaded from the factory 
> > classloader.
> > Result being a 50% performance increase for small objects.
> >
>
> Are you using synchronization for access? And if so, have you tried it
> with multithreading/multiprocessors?
>

My application is multithreaded, and I do use it in multi-cpu environments.
Like I said above, no sync is required for access on an array. In
worst case I'll end up with 2 threads finding null, loading the same
class, setting the same index in the array, no big deal no matter
which one goes first.

> > Unmarshalling (improvement from marshalling above applies too):
> > for small objects unmarshalled from big factories, the time taken to
> > build the cache map is really BIG (and useless).
> >
>
> If you know the type of object to be unmarshalled you should be able to
> avoid this overhead completely by instead using an instance of the
> object cast to IUnmarshallable, calling the unmarshal() method. But this
> approach is certainly not encouraged by the code samples and such, and
> probably wouldn't occur to users.
>
> There's one obvious optimization that could help with very large
> mappings, which is passing the size to the HashMap constructor. I've
> made that change in my code, but you may want to try it out to see if it
> makes any significant difference for your case.
>

In most cases I do not know the type of my objects before hand, but
I'll give a go to the "cast-call-unmarshal", I just never thought
about it (I'm not sure I'll gain any benefit from it because the map
will have to be built for nodes inside the first one, am I wrong ?).

>
> > If you supposed that all node names are intern'ed String (all litteral
> > Strings are, and programmatically added ones can be intern'ed), then
> > it's much more efficient to search the array directly with ==
> > comparison (and not .equals) after calling intern() on the given
> > string (which is probably already intern'ed by xpp).
> > In fact even for large objects in large factories, it's still more
> > efficient (we're talking about a hashmap of arraylists/integer built
> > every time I unmarshal a single object here).
> >
> > Let me know your thoughts about those, and I'll gladly share them with
> > the community.
> >
>
> I don't think intern'ing can be assumed, since not all parsers (let
> alone other possible sources of document data) do this. It would
> probably make sense to just always create the map during the
> initialization of the binding factory, though, and pass it to the
> unmarshalling context. That way it's just a one-time overhead. However,
> JiBX currently allows nested <mapping> definitions, where one is only
> defined in the context of another - and that means the map is
> unmarshalling state dependent. For 2.0 I plan to eliminate this nested
> mapping feature completely, since it's never really been of much use.
>

Yes the map can be modified in the context itself, without applying
the modification to the factory, so creating it in the factory would
required a clone, which is close as much expensive as a new creation.
Note that I do NOT suppose the node name returned by the parser is
intern'ed (I call intern() on it to be sure it is), but I do suppose
the names inside the factory array are all interned (true for
generated binding, and intern() call can be added to context
modification functions).

>
> > My next move will be to cache the (un)marshallers themselves (maybe
> > not reset them in context reset).
> > I know that's dangerous for external (custom) (un)marshallers (even if
> > I don't use any that can't be re-used), so I'll try to find a solution
> > for that too (maybe a two layer cache, one for JIBX generated ones,
> > re-usables, and one (lazily built and reset) for others).
> >
>
> Sounds interesting, and please let us know what you find.
>
> > That's my 2 cents on JIBX, which is already a great framework.
> >
> > Thanks for reading.
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by: Splunk Inc.
> > Still grepping through log files to find problems?  Stop.
> > Now Search log events and configuration files using AJAX and a browser.
> > Download your FREE copy of Splunk now >>  http://get.splunk.com/
> > _______________________________________________
> > jibx-users mailing list
> > jibx-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/jibx-users
> >
> >
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >>  http://get.splunk.com/
> _______________________________________________
> jibx-users mailing list
> jibx-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/jibx-users
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
jibx-users mailing list
jibx-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jibx-users

Re: [jibx-users] (Un)MarshallingContext bad performance

Reply via email to