[ 
https://issues.apache.org/jira/browse/OPENJPA-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838841#comment-16838841
 ] 

Will Dazey commented on OPENJPA-2767:
-------------------------------------

A further issue I noticed was that for large ORMs, a race condition occurs in 
MetaDataRepository.processRegisteredClasses. This race condition causes other 
threads to initialize other caches too early. 

To explain this race condition, take a look at 
MetaDataRepository.processRegisteredClasses():


{code:java}
Class<?>[] processRegisteredClasses(ClassLoader envLoader) {
    if (_registered.isEmpty())
        return EMPTY_CLASSES;
 
         // copy into new collection to avoid concurrent mod errors on reentrant
         // registrations
    Class<?>[] reg;
    synchronized (_registered) {
        reg = _registered.toArray(new Class[_registered.size()]);
        _registered.clear();
    }
 
    //Code then proceeds to process the 'reg' copy
    ...
}
{code}

The race condition revolves around the fact that `_registered` is cleared, but 
there is still more processing to do before `MetaDataRepository._subs` actually 
becomes populated. This creates a very small window in time for a second thread 
to call `MetaDataRepository.processRegisteredClasses()`, see that 
`_registered.isEmpty()` is true, and therefore assume it's safe to try to get 
subclasses from `MetaDataRepository._subs`. We can observe this sort of call 
sequence in ClassMetaData.getPCSubclasses (which is used to initialize 
ValueMapDiscriminator's cache):


{code:java}
public Class<?>[] getPCSubclasses() {
    if (_owner != null)
        return MetaDataRepository.EMPTY_CLASSES;

    _repos.processRegisteredClasses(_loader);
    if (_subs == null) {
        Collection<Class<?>> subs = _repos.getPCSubclasses(_type);
        _subs = (Class[]) subs.toArray(new Class[subs.size()]);
    }
    return _subs;
}
{code}
*notice that `ClassMetaData._subs` is populated from 
`MetaDataRepository.getPCSubclasses()` (which uses the 
'MetaDataRepository._subs' cache I mentioned above). 

Now, imagine the scenario where one thread (T1) comes into this method and 
kicks off `MetaDataRepository.processRegisteredClasses()`, which will clear 
`MetaDataRepository._registered`. In the mean time, another thread (T2) also 
comes through this method, but since `MetaDataRepository._registered` is empty, 
it returns immediately. For T2, `ClassMetaData._subs` will be null and T1 is 
still in the middle of processing registered classes. This is the window where 
issues can start appearing and dirty reads are made. 
T2 will possibly initialize `ClassMetaData._subs` as an empty List since T1 is 
still busy populating `MetaDataRepository._subs`. ValueMapDiscriminator will 
then get an empty result in its cache, even though `MetaDataRepository._subs` 
will soon populate with the correct value.

Now, `MetaDataRepository.processRegisteredClasses()` will clear 
`ClassMetaData._subs` so that it can be reinitialized later. But how many times 
does ValueMapDiscriminator need to call getSubClasses until the value is 
finally populated? The most straightforward answer to this is to synchronize on 
`MetaDataRepository.processRegisteredClasses()` so that other threads wont make 
dirty reads.

> Incomplete ValueMapDiscriminatorStrategy cache 
> -----------------------------------------------
>
>                 Key: OPENJPA-2767
>                 URL: https://issues.apache.org/jira/browse/OPENJPA-2767
>             Project: OpenJPA
>          Issue Type: Bug
>            Reporter: Will Dazey
>            Assignee: Will Dazey
>            Priority: Minor
>             Fix For: 2.2.3
>
>         Attachments: OJ2767_2.2.x_v4.patch
>
>
> There exists the possibility that the discriminator class cache in 
> ValueMapDiscriminatorStrategy was originally created before the server fully 
> loaded all classes. If this happens, the cache is incomplete and needs to be 
> rebuilt. A simple enough strategy would be to just try rebuilding the cache, 
> before giving up and throwing a CNFE, and check again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to