Dain Sundstrom wrote:
On Apr 7, 2008, at 7:06 AM, Rick McGuire wrote:
Dain Sundstrom wrote:
I've been sucked into another project and haven't been paying much
attention to the lists...
The problem is we flush before returning the created object to the
caller. The reason we do this is because database generated fields
are not filled in until the flush statement which means the primary
key is not guaranteed to be available until flush. The current code
requires the primary key to create the cmp proxy we return to the
caller. The code will have to be changed to allow for late primary
key resolution either when the code calls getPrimaryKey or at the
end of the transaction.
I don't have the time to look at this, but I can help you if you
want to work on it.
I've started poking around in the code trying to understand what
needs to change. Is the JpaCmpEngine.createBean() method where the
flushing takes place? It appears at that point in time that the
primaryKey is used for 1) creating the ThreadContext instance, 2) ror
storing the bean in the transaction cache, and 3) for creating the
ProxyInfo instance. Am I looking in the correct location for this?
Yes.
The ThreadContext primary key bit looks easily changed to a lazy
resolution, and probably the ProxyInfo as well, but the transcaction
cache does not appear to be as easily changed, since the primary key
is the main lookup method for the transaction cache. I guess the
transaction cache step could be bypassed until the primary key is
actually generated, but I'm concerned that this could result in some
resolution failures where an object would be expected to be located
in the cache.
The transaction cache was introduced as a work around to the
new-delete-new bug in OpenJPA (see JpaTestObject.newDeleteNew()). If
you create, remove and recreated a bean with the same pk, OpenJPA
internally leave the pk as "deleted" so calls find(Class,Object)
result in a null. We work around this by using a private cache to
track the objects created during the transaction.
To implement delayed flush, you will have to add another way to track
the JPA instance object (since we won't have the pk to "find" the
object in the entity manager). When the pk is not available, you use
the new, alternate, method to find the object, and when the pk is
finally resolved, you would add it to the transaction cache.
I've not come up with any clever way of implementing the cache so far,
other than just keeping a list of objects whose primary keys have not
been calculated, and then, if all other lookups fail, start resolving
the primary keys looking for the given target. Not elegant, but I think
this will work.
I do wonder if another approach might work better. If I understand the
reasoning behind the flush, it is necessary because it's possible that
some of the information needed to calculate the primary key only becomes
available after the JPA flush()/merge() sequence. I suspect for many
objects, this is not needed because a simple primary key is used. Would
it be feasible to detect the situation where a flush is needed to
"crystalize" the object to calculate the primary key? This way, simple
object instances where the primary key is provided in the create()
operation would not experience the performance hit.
Rick
Off the top of my head, it may be possible to use a stand-in pk object
which wraps the JPA object itself (using identity based hashcode and
equals) until the real pk is resolved. This pk object would then be
the alternate tx cache.
Any pointers on where the end of transaction processing would need to
be performed?
CmpContainer.ejbLoad(EntityBean) uses
TransactionSynchronizationRegistry.registerInterposedSynchronization
to store entities at the end of the transaction. You'll want to
expand that logic to handle pk resolution in addition to ejbStore
callbacks. The registerInterposedSynchronization doesn't really
handle ordering well so I suggest you use a single Synchronization
object to handle processing of the pks and the ejb store callbacks.
One other think to keep in mind is that before a CMP is passed to a
remote vm, you'll need to make sure the pk has been resolved.
-dain