Re: CMIS Implementation Experiences

Florent Guillaume Wed, 16 Dec 2009 09:35:15 -0800

Hi Florian, David,

On Tue, Dec 15, 2009 at 5:24 PM, David Nuescheler
<[email protected]> wrote:
> On Tue, Dec 15, 2009 at 4:38 PM, Florian Müller <[email protected]> wrote:
>> In order to explain the rationale behind the OpenCMIS design I would like to 
>> talk about
>> some of the experiences that we made with CMIS client and server 
>> implementations.
>> We also started with Abdera on the server side. It turned out to be more 
>> pain than joy.
>
> i think in the absence of Dominique (who is on vacation) it is fair to
> say that in our
> initial implementations we also found that there were some extension points
> that we had to use that made us feel like abdera was not exactly
> designed to inject
> cmis in a surgical operation.
> that may have to do with cmis and its use of atompub, but i think i
> agree with the
> general sentiment...


Agreed, we totally need extension points for each method in the SPI.

>> With a pure JAXB design we ran into compatibility issues. A good tradeoff 
>> between
>> efficiency, correctness and maintainability seems to be StAX with JAXB. 
>> OpenCMIS
>> handles all AtomPub related tags with StAX and all CMIS related data with 
>> JAXB.
>> The JAXB objects are not exposed to the application. They are just interim 
>> objects.
>> The same StAX/JAXB design should work on the server side as well. The
>> effort to implement AtomPub is manageable. I've done this in my CMIS
>> FileShare project.
>
> sounds like a reasonable proposal to me. especially given your experience
> advice would be very welcome.

Yes, that seems like a good way to do it. There's the overhead of
instantiating JAXB objects just to serialize them later to XML, when
you could just generate the XML using StAX, but that's an acceptable
tradeoff. The codebase we have today uses pure StAX because of its
history, we had a high-performance need for a customer and reducing
the number of generated objects was paramount. But I'm pretty open to
refactoring this.

>> Another detail we learned is that implementing both bindings in parallel
>> saves you a lot of refactoring later. Both CMIS bindings are really
>> different. If you align your classes and flows to just one binding you
>> might have to refactor a lot later  to make the other binding work smoothly.
>
> agreed. i think the chemistry focus on the atompub parts of the spec
> was just a way to get started, rather than a long-term plan.

Yes, I want to set up the SOAP client and server bindings soon,
hopefully before the end of the year. I have experience in a basic
SOAP server for Nuxeo bindings, and the client part shouldn't be hard
to start.

>> We introduced type (and repository info) caching based on our
>> experiences with applications using a CMIS library. Applications need
>> type information all over the place and it is expensive to fetch them over
>> and over again.
>
> absolutely. we ran into the same situation with jcr remoting through
> our spi layer in jackrabbit. luckily, jcr already anticipates such caching
> layer and exposes explicit "refresh()" methods.

We already have caching of the types in APPRepository (see
loadTypes()), and the repository info is read only once as well.

>> From a library perspective one can argue that caching should be done a
>> level above the library.
>> From practical standpoint it would be nice if it is done once and right.
>
> i would even argue that depending on the application you may cache on
> the application in addition of the caching in the pure transport layer.
> i think there is nothing wrong with a cache as long as the application has
> a means to refresh/invalidate the cache... ideally this would be possible
> for parts of the cache, per folder/document or similar...

Yep. Eventually I want to put intelligent caching in the layer that
implements the high level API, but I've held off on this for now
("optimize later").

>> So we decided to put it into OpenCMIS. If an application doesn't want it,
>> it can switch it off. The caching works implicitly.
>
> i would say a refresh much like in a browser could give the application the
> option to flush parts of the cache or even expose that to the user.
> in many cases the user "knows" that something changed and having
> something like a "refresh"-button in the browser can help.
> in my experience it really saves you a lot of first support calls, since
> if the user does not see what he wants to see, he just hits the refresh
> button, but of course that's a concern of the application and not
> of the cmis client.

I'm not too fond of explicit refresh actions, unless it's unavoidable...

>> Whenever a type definitions runs through the library the data is
>> cached or refreshed. CMIS provides no mechanism to detect
>> type changes.

That's a good way to do it. For now Chemistry (for AtomPub) reads all
the types on the first connection and cache them, but this could be
done lazily as you describe.

> i think type changes happen infrequent enough, that it is not
> an issue in the majority of the cases, especially if we
> expose an explicit "refresh" of the cache delegated to
> the app or the user.
>
>> So there is a slight chance that the type cache holds outdated
>> data. In an enterprise scenario (and that's what OpenCMIS is
>> aiming at) type changes shouldn't happen often. They are
>> usually interconnected with an update or re-deployment of the
>> application. A paranoid application developer can switch off the
>> cache (and accept the performance penalty) or clear the cache
>> regularly (every hour or every five minutes or every 30 seconds...)
>> or create a new session once a while.
> ...or let the user of the app decide. especially
> webapp users are used to refresh buttons ;)
>
>> Since sessions are bound to logins there is a regular exchange
>> of sessions and therewith caches, anyway.
> sounds good.

Yep.

>> Another aspect that we think is important are extensions. CMIS
>> defines a lot of extension points and repositories will make use of
>> it sooner or later. Application should be able to access and set
>> extension data. Sure, it is against the idea of a standard but it will
>> happen and the library should be prepared for that. The difficult
>> part here is to make the binding invisible to the application since
>> some extension points are very binding specific. Using JAXB in both
>> bindings covers a lot but not everything. OpenCMIS has the
>> infrastructure in place but is not perfect in this regard, yet.
>
> i think extension points are very desirable particularly in something
> that should be a framework for various implementations / users.
> having said that, superfluous extension points always become
> a maintenance and backwards compatibility issue in the future, when
> we want to refactor things again, and are not sure if we break someones
> extensions... so i think we should choose extension points based
> on real-life scenarios, rather than on wild ideas ;)
> i think we are a group that is broad enough here that we have enough
> real-life use cases to come up with a good set of extension points
> to start with.

I agree with you that extensions are absolutely needed, they're in the
spec for a reason. However at the same time they shouldn't make the
APIs too burdensome to use...

Florent

-- 
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87

Re: CMIS Implementation Experiences

Reply via email to