Re: Proposal to clarify CmisObject caching behaviour

Florian Müller Wed, 13 Oct 2010 02:42:56 -0700

Hi all,

I think there are a few more things we should take into consideration.


- Properties are just one part of the story. What about ACLs, relationships and 
policies? If we want a real transient and consistent behavior we would have to 
cache them as well and wait until save() is called before we send them to the 
repository. 
  I doubt that we can create a consistent object on the client side that way. 
We don't know the repository rules for ACLs, relationships and policies.

- What happens if save() fails? Or even worse, if one part of save() fails? For 
example, we could update the properties but not the ACL. The state of the 
object would be ambiguous.

- CmisObjects are potentially shared between threads. While one thread changes 
properties (or ACLs or relationships or policies), another thread would see 
inconsistent data.
  We could only avoid that by either create a cache per thread (which defeats 
the purpose of the cache) or create a copy of the cached object every time we 
hand one out. In the latter case all threads would have their own copies and 
would not see any updates other threads make. That, in turn, could end up in 
more refreshes and more calls to the repository.


My conclusion is that we can't build truly transient objects and we really 
should avoid half-transient objects by, for example, just looking at the 
properties.


To rephrase Davids proposal a bit:

- All write operations provided by CmisObject should automatically do an object 
refresh after the update. That guarantees that the object is always consistent. 
The cost for this consistency is an additional call to the repository.

- Is some cases you don't need or want this addition cost. Lets say you just 
want to update a bunch of objects but you don't work with them afterwards. 
That's what the operations provided by Session are good for. They just do that 
and nothing else.



I understand that the transient behavior is quite useful in some scenarios. But 
it simply doesn't work in multi-threaded environments. What we can do is to 
provide transient wrapper classes. Objects of these classes are not cached and 
owned by _one_ thread. At their core they reference and redirect calls to a 
consistent and shared CmisObject. But they can intercept calls and add 
transient behavior.
getProperty, for example, could look like that:


public <T> Property<T> getProperty(String id) {
    if(transientProperties.contains(id)) {
        return (Property<T>) transientProperties.get(id);
    }

    return (Property<T>) sharedObject.getProperty(id);
}


A thread using such a wrapper object knows that it might be inconsistent and it 
could throw it away if save() fails and start over. 


- Florian



On 13/10/2010 08:44, Klevenz, Stephan wrote:
> Hi David,
> 
> Thanks for starting this discussion. You are right because of the current 
> cache behavior of the implementation also with relation to the usage of API 
> is not easy to understand.
> 
> Basic idea behind the cache behavior can be described in 3 steps:
> 
> 1) query for an object and specify filters and put it in the cache
> 2) read from the cached object and write to the cached object while using 
> convenient micro operations (set and get property).
> 3) write back changes of the cached objet to the persistency (save)
> 
> For 2) only that information is available that was specified by the filter in 
> 1). Additional information would require another query.
> 
> I would like to keep this behavior because it makes it easier to use API 
> within another framework where someone has to implement a callback not having 
> the context for a full service call or cannot influence how many times the 
> callback is executed just to return the same information.
> 
> Confusing is that there are some methods calls on CmisObject level that 
> bypasses the cache. updateProperties() for instance.
> 
> What do you think to come up with this rule set:
> 
> 1) All methods on CmisObject level operate always on cache data. The 
> programming paradigm would be query - modify -save.
> 2) All modification operations on the Session level are not cached and will 
> be directly routed to the backend.
> 
> Here is some pseudo code of a cached scenario:
> 
> Session s = ...;
> Document doc = s.getObject( ... with filter ... );
> doc.setProperty( ... );
> doc.updateProperties( ... );
> if (ok) {
>       doc.save();
> }
> 
> If doc is not saved then the corresponding object in the backend is not 
> modified.
> 
> A non cached pseudo code looks like this:
> 
> Session s = ...;
> s.updateProperties( objecteId, .... );
> 
> If we could agree on such a rule then we have to re-work CmisObject to remove 
> methods that cannot be cached and add that methods to the Session class. Some 
> methods could be exist twice, one at the CmisObject (cached) and one at the 
> Session (not cached).
> 
> WDYT?
> 
> Regards,
> Stephan
> 
> 
> 
> 
> -----Original Message-----
> From: David Caruana [mailto:[email protected]]
> Sent: Dienstag, 12. Oktober 2010 18:19
> To: [email protected]
> Subject: Re: Proposal to clarify CmisObject caching behaviour
> 
> On 12 Oct 2010, at 16:22, Florent Guillaume wrote:
> 
>> Hi David,
>>
>> I'm a bit confused by some of the vocabulary I must confess. Could we
>> make sure we don't confuse the term "cache" and the term "transient
>> space", as they have different uses.
> 
> Of course.
> 
>>
>> So do you want to remove completely any transient space and only have
>> methods that write to the remote side on each method call? So, do you
>> want to remove the fact that we can do:
>>   object.setProperty("foo1", "bar1");
>>   object.setProperty("foo2", "bar2");
>>   ...
>>   object.updateProperties();
>> i.e. today there's a mini transient space tied to the properties of
>> the object, flushed on updateProperties. I think this is quite useful
>> as it avoids having the user building a map by hand and passing it to
>> updateProperties, which is ugly for a high-level interface.
> 
> Yes, the proposal is to remove the mini transient space. The reason is to 
> simplify behaviour if we were to introduce a CmisObject implementation that 
> always refreshed its cache on updates. For example, with a transient space, 
> what is the behaviour when setProperty has been called and an "update" method 
> is then called prior to updateProperties. Are the transient changes 
> discarded, flushed, or just left as is?
> 
>>
>> I'm ok with most methods that can do remote writes returning a CmisObject.
>>
>> But I don't like removing convenience methods on CmisObject like
>> getRelationships, getAcl, etc. as they provide convenient access to
>> the users.
> 
> There would still be the methods getRelationships() and getAcl(), but they 
> would only return what has already been cached (as they do today). Currently, 
> there's also getRelationships(boolean, RelationshipDirection, ObjectType, 
> OperationContext) and getAcls(boolean) which go direct to the repository. 
> It's these that I propose are moved to Session.
> 
>>
>> OTOH I'm completely for clarifying the caching aspects, for instance
>> it's not clear today when caches are invalidated or updated, or what
>> happens when you ask for a property that's not in the initial property
>> filter that led to the creation of the object (in a number of my unit
>> tests in Nuxeo I have to refresh() explicitly but this shouldn't be
>> needed when all the writes were mine).
> 
> Indeed, this is the confusion I'm trying to remove with this proposal. 
> Admittedly, there is much to discuss, and the proposal is only a starting 
> point, but I wanted to open up the discussion early to determine if this 
> problem is severe enough to tackle.
> 
> Regards,
> Dave
> 
>>
>> Florent
>>
>>
>>
>> On Tue, Oct 12, 2010 at 12:49 PM, David Caruana
>> <[email protected]>  wrote:
>>> Currently, we only have one implementation of Session - 
>>> PersistentSessionImpl. However, the OpenCMIS client API suggests transient 
>>> behaviour both in its API definition and behaviour, which I believe can be 
>>> confusing for users. In particular, when does OpenCMIS read information 
>>> from the cache vs from the repository, and when does OpenCMIS implicitly 
>>> update an item in its cache.
>>>
>>> I'd like to present the following changes in an attempt to clarify the 
>>> above and trigger discussion on how to improve things in this area...
>>>
>>> 1) All methods on CmisObject operate against the cache. This means that 
>>> reading a value from an item reads the value from the item in the cache, 
>>> and updates to the item change the cache i.e. refresh the item in the 
>>> cache. Methods that read directly from the repository move to Session. So, 
>>> the proposal is:
>>>
>>> Remove the following from CmisObject...
>>> CmisObject.setName(String)
>>> CmisObject.setProperty(String, Object)
>>> CmisObject.updateProperties()
>>>
>>> Change CmisObject.updateProperties...
>>> CmisObject CmisObject.updateProperties(Map<String, ?>)    (Note: return 
>>> CmisObject instead of ObjectId)
>>>
>>> Note: If the repository creates a new version, a new CmisObject is 
>>> returned, otherwise the existing CmisObject is refreshed in the cache and 
>>> returned.
>>>
>>> The following 'update' methods are also modified to refresh the item in the 
>>> cache after update:
>>>
>>> void CmisObject.applyPolicy(ObjectId...)    (Note: accept vararg of policy 
>>> ids)
>>> void CmisObject.removePolicy(ObjectId...)   (Note: accept vararg of policy 
>>> ids)
>>> Acl CmisObject.applyAcl(List<Ace>, List<Ace>, AclPropagation)
>>> Acl CmisObject.addAcl(List<Ace>, AclPropagation)   (Note: also return Acl 
>>> to be consistent with applyAcl)
>>> Acl CmisObject.removeAcl(List<Ace>, AclPropagation)   (Note: also return 
>>> Acl to be consistent with applyAcl)
>>>
>>> Note: applyPolicy and removePolicy are also changed to accept a vararg of 
>>> ObjectIds.
>>>
>>> For use cases where the refresh of an item after update is not necessary, 
>>> the Session interface is used instead. So, the proposal is...
>>>
>>> Move following methods from CmisObject to Session...
>>>
>>> ItemIterable<Relationship>  Session.getRelationships(ObjectId, boolean, 
>>> RelationshipDirection, ObjectType, OperationContext)
>>> Acl Session.getAcl(ObjectId, boolean)
>>>
>>> Add...
>>>
>>> ObjectId Session.updateProperties(ObjectId, Map<String, ?>)
>>> void applyPolicy(ObjectId, ObjectId...)
>>> void removePolicy(ObjectId, ObjectId...)
>>> Acl applyAcl(ObjectId, List<Ace>  addAces, List<Ace>  removeAces, 
>>> AclPropagation aclPropagation)
>>> Acl getAcl(ObjectId, boolean onlyBasicPermissions)
>>>
>>> 2) Following the pattern in 1), setting content on a Document alters 
>>> slightly too.
>>>
>>> Document setContentStream(ContentStream contentStream, boolean overwrite)
>>> Document deleteContentStream()
>>>
>>> Instead of returning ObjectId, the proposal is to return Document. 
>>> Depending on the repository, the document may be the same item as before 
>>> (updated in the cache), or a new Document, to represent a new version.
>>>
>>> Also add to Session...
>>>
>>> ObjectId setContentStream(ObjectId, ContentStream contentStream, boolean 
>>> overwrite)
>>> ObjectId deleteContentStream(ObjectId)
>>>
>>> 3) Remove transient methods from Session.
>>>
>>> Until we've thought through transient session behaviour I propose we remove 
>>> the following methods:
>>>
>>> Session.save()
>>> Session.cancel()
>>>
>>> Regards,
>>> Dave
>>
>>
>>
>> -- 
>> Florent Guillaume, Director of R&D, Nuxeo
>> Open Source, Java EE based, Enterprise Content Management (ECM)
>> http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87
>

Re: Proposal to clarify CmisObject caching behaviour

Reply via email to