Craig L Russell wrote:
First, let me say that I admire your passion. I wish that all expert
group members were thus.
I figure that if it's worth mentioning in the first place, then it's
worth pursuing until it's clear to me that it's a flawed idea, or clear
to others that it's a sound one. It is, admittedly, costing me much
more time and effort to get to either state than I would like. But as
the British say, 'in for a penny, in for a pound'.
I do have a quibble with your counter example below. Your code ignores
the return boolean value from this.lines.add(line). What value would
you return if the collection were not loaded?
OK, interesting point. The JDO impl would at least have to do a single
SELECT to verify if the Collection.contains() the added item. It still
doesn't *have* to fault in the entire collection. If there were going
to be *repeated* inserts to the collection in this manner (say for a
dozen line items being attached to an invoice), then it might be more
efficient to fault in at least the PK's of the collection. This to my
mind is just one more piece of information to be added to fault
groups/fetch plans.
So it seems that whenever Set.add() or Map.put() is invoked (regardless
of how 15.3 reads), the price of an immediate datastore access is
incurred, because the contract of these methods promises to tell if the
collection was substantially modified EACH time.
Thus I concede there is some inherent performance advantage to be gained
by avoiding Collection.add() in user code, when the collection is in
fact transparently persisted. (A point I hadn't appreciated until now
.. thanks for asking a good question).
I can also appreciate that RDBMS' present an opportunity, whereby a
value is flushed to the backing column of a <mapped-by/> field
effectively updates both sides anyway, so why not let the user have it
as soon as practicable? The timing you propose - when DetachAllOnCommit
occurs - is even laudable given that the JDO impl apparently lately
can't be relied on to intercept mutators to bring it about immediately.
In view of the performance savings attained by avoiding unnecessary
calls involving Collection.contains() (and only for such savings), this
seems a desirable hack. (Of course, when we finally get a JSR for
managed relationships, the hack won't be required any further).
I guess my main beef is that while this (performance motivated)
optimization benefits me performance-wise when I care to use it, and
doesn't cost me performance-wise when I don't care to use it, it comes
at the price of a cognitive burden - whether I happen to need it or not.
That burden is that I have to "watch my step", and not use the object at
the as-yet-unsynchronized end of the relationship, until after when the
15.3 guarantees it will be synchronized. I dislike this prospect (even
assuming it is always possible to keep track of the necessary state,
which is by no means obvious to me), because I already have my hands
full with programming obligations. Despite repeated attempts, I was not
successful in getting my fellow user Bin to un-captiously remark "I love
this burden - this burden is everything I dreamed of", so I will assume
for now I am not the only person in the world to dislike it.
So far I have argued that the burden of keeping track which objects I
can and can't use is always unnecessary. But for the sake of outsmoking
EJB3, I am willing to admit that it sometimes might be worth bearing.
So let's concentrate on making the burden habitable.
There are 3 strategies for dealing with this burden:
#1 The SyncRelationshipsAfterCommit behaviour happens or not, but I
delcare to the PM that I am studiously not relying on it, and wish to be
notified by a runtime exception if my code (or 3rd party code) fails to
update both sides of a relationship by the time commit occurs. There is
no cognitive burden. I forego the performance benefits of avoiding
Collections.add(). My code works fine with non-managed objects in
different contexts.
#2 The SyncRelationshipsAfterCommit behaviour always happens, but I
choose as a matter of policy not to rely on it, and I always manually
update both sides of the relationship. I forego the performance
benefits of avoiding Collections.add(). There is a small chance that my
code won't work with non-managed objects in different contexts, because
JDO doesn't tell me if I unintentionally violate my own policy.
(Although it allows me to selectively and intentionally violate it,
which might sometimes be beneficial). So some cognitive burden remains.
#3 The SyncRelationshipsAfterCommit behaviour always happens, and I
choose to exploit it by judicious and minimal use of the model before
commit. I live with the burden. I attain the performance benefits, I
win the Petstore 'benchmark'. I never update both sides of the
relationship unless I can't help it. Sometimes I will have to, because
the model objects are used by 3rd-party code I have no control over, and
it willl expect the relationship to be completely mutual even before
commit. When trying to use my code in contexts where the objects are
non-managed, I may have to rewrite my minimal pre-commit code, since the
absence of the synchronization in the non-managed environment will mean
that my post-commit code won't be receiving the model in the expected
consistent state. In the best case, because I partitioned my code
according to the principles of OO, and not according to the time when it
gets executed, I'll just have to touch the internals of every second
setter involved in setting up a bi-directional relationship. If I've
relied so heavily on the synch behaviour that I omitted accessors like
'Set getChildren()', mutators like 'void add(DomainEntity)' or factory
methods like 'DomainEntiry newChild()' on some of my interfaces, it'll
break existing clients of my code.
I hope I have established that #3 might not be every developer's cup of
tea, and that they might prefer to accept some performance hit to avoid
both the cognitive burden and the potential implications for their
code. Certainly, they should be given the choice. They sort of have
the choice with #2, but it is a bit hit-and-miss and they are not
receiving a lot of help from JDO in enforcing their policy. This would
be adequately, cheaply, and neatly addressed by #1.
I contend that providing for #1 and the user to request that partial
updates to relationships at commit be regarded as an 'inconsistent
update' is no more of a burden on vendors than it is for them to
synchronize the memory model - they have to perform the detection in any
case. So there is no good reason why you shouldn't allow this strategy
(in addtion to the others) in the JDO 2.0 spec.
in conclusion,
David.