[
https://issues.apache.org/jira/browse/OPENJPA-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579938#comment-16579938
]
ASF subversion and git services commented on OPENJPA-2646:
----------------------------------------------------------
Commit 1838026 from [email protected] in branch 'openjpa/branches/2.4.x'
[ https://svn.apache.org/r1838026 ]
OPENJPA-2646: Performance Regression Fix
> Sporadic NullPointerException occurs under heavy load when QuerySQLCache is
> enabled.
> ------------------------------------------------------------------------------------
>
> Key: OPENJPA-2646
> URL: https://issues.apache.org/jira/browse/OPENJPA-2646
> Project: OpenJPA
> Issue Type: Bug
> Components: query
> Affects Versions: 2.2.3, 2.4.1
> Reporter: Heath Thomann
> Assignee: Will Dazey
> Priority: Major
> Fix For: 2.2.3, 2.4.3, 3.0.0
>
> Attachments: MyEntity.java, OPENJPA-2646-2.2.x.patch,
> Perf_Regression.patch, PreparedQueryCacheTest.java, PreparedQueryTest.debug
>
>
> Hello! This is a follow up/continuation of JIRA OPENJPA-2609. I will assume
> the reader slogged through that one as some of the info in that JIRA is very
> relevant here. The fix for that JIRA was necessary to plug a hole in
> PreparedQuery (PQ) cache. The fix did reduce my customer's occurrence of the
> exception by around 90%. However, they still saw the reported exception and
> in addition they see following exception:
> Caused by: java.lang.NullPointerException
> at org.apache.openjpa.persistence.QueryImpl.preExecute(QueryImpl.java:557)
> at org.apache.openjpa.persistence.QueryImpl.execute(QueryImpl.java:285)
> at org.apache.openjpa.persistence.QueryImpl.getResultList(QueryImpl.java:302)
> at com.ibm.ws.persistence.QueryImpl.getResultList(QueryImpl.java:118)
> .....
>
> As I mentioned in OJ2609, I was not able to exactly recreate the issue. I
> described some of the strategic break points I had to use, and the
> assumptions I had to make. Alas, I was able to recreate this issue with a
> multi threaded test, and specific query string (among other things)! This
> JIRA will take the reader on another journey through the laborious process I
> went through to recreate the issue, and finally track down the very, I mean
> very, narrow window where the CCEx or NPE can occur. So, settle in and lets
> start our fun adventure!
> First, let me give an overview of the issue and look at the code to
> illustrate how the exceptions can occur. After that I'll dive into the
> recreate.
> The exception can occur when an item (a query string mapped to a PQ in my
> case) in the CacheMap is moved from the "soft" cache to the main cache.
> Please view the CacheMap code to see what I mean by the main cache (see
> variable 'cacheMap') and what I mean by the soft cache (see variable
> 'softMap'). When an item exists in the soft cache, and a thread requests
> that item, the item is moved out of the soft cache and put in the main cache.
> If the main cache is full, another item is kicked out of the main cache.
> The item which is removed from the main cache is put into a soft cache. As
> this move from soft cache to main cache occurs, there is a narrow window
> where the item appears to not be in the caches at all. In other words, if a
> (reader) thread asks (CacheMap.get) for item X, where another (writer) thread
> is moving item X from the soft cache to the main cache, the reader thread
> will not see item X and think the item is not cached.
> To see this better, lets look at CacheMap.get:
> public Object get(Object key) {
> readLock();
> try {
> // Check the main map first
> Object val = cacheMap.get(key);
> if (val == null) {
> // if we find the key in the soft map, move it back into
> // the primary map
> val = softMap.get(key);
> if (val != null){
> put(key, val);
> }else{
> val = pinnedMap.get(key);
> }
> }
>
> return val;
> } finally {
> readUnlock();
> }
> }
> As you can see, we first look in the main cache for a given key (in my
> scenario the key is a query string). If it is not found, we look in the soft
> map. If it is in the soft map, we call the 'put' method. The 'put' method
> is going to take the key out of the soft map and put it in the main cache
> map. Let me paste the relevant parts of the put method (I'll leave it to the
> reader to view the method in its entirety):
> public Object put(Object key, Object value) {
> writeLock();
> try {
> ........
> // otherwise, put the value into the map and clear it from the
> // soft map
> val = put(cacheMap, key, value);
> if (val == null) {
> val = remove(softMap, key);
> ........
> } finally {
> writeUnlock();
> }
> As you can see here, this 'put' method calls to another 'put' method which
> simply puts the key/value into the specified map. If the map didn't already
> contain a value for the given key (which I think should always be the case
> for the cacheMap, or at least for my scenario), null will be returned. When
> null is returned, remove is called to remove the key/value from the soft map.
>
> Given this, lets look closer at the 'get' method with concurrency in mind.
> Take the case where two threads enter the 'get' method at the same time.
> Next, assume both threads are moving lock step and they both call
> 'cacheMap.get' with the same key, and assume the key is not in the main
> cache. So at this time both threads will have a 'val' of null. Next assume
> one thread gets focus (so the other thread suspends for a moment) and it
> calls the soft map with the key. Assume the soft map contains the key. In
> this case the thread will move (via the call to 'put') the key/value from the
> soft cache to the main cache. This thread will then have a non-null 'val'
> and will return. Next assume the suspended thread gets focus and continues.
> Since its 'val' is null, it will ask the soft cache for the key. However, we
> know that the other thread just moved the key/value from the soft to main
> cache! As such, the thread will not find the key in the soft cache and will
> return a 'val' of null!!! This is at the heart of the issue! That is, one
> thread thinks there is nothing in the cache for their key!
> The observant reader may notice that the 'get' and 'put' have a 'readLock'
> and 'writeLock', respectively, which attempts to get a ReentrantLock. So one
> would think that there is proper read/write access of the caches to avoid
> such situations. However, one must look at the read/writeLock methods. If
> you look at the readLock method, you can see that it locks using a
> '_readLock' variable. This variable is null for the CacheMap instance
> created by the PreparedQueryCacheImpl. Therefore, the readLock method is a
> no-op. This means that a reader thread can do a read at any time without any
> locking concerns.
> Now that we know the heart of the issue (i.e. one thread received an
> incorrect cache miss), lets look at the implication of this incorrect cache
> miss. What I haven't stated is where the threads are at when they call 'get'
> on the cache. This is easier to see when looking at the attached test.
> Basically though one thread is in PreparedQueryCacheImpl.register,
> specifically this line:
> PreparedQuery cached = get(id);
> Here is the stack from my test which I will describe:
> PreparedQueryCacheImpl.get(String) line: 193
> PreparedQueryCacheImpl.register(String, Query, FetchConfiguration) line: 92
> QueryImpl<X>.preExecute(Map) line: 554
> If you look at the 'register' method, you can see that if the PreparedQuery
> is not found in the cache (i.e. 'get(id)' returns null), one will be created
> and an attempt made to put it into the cache. If you look at
> PreparedQueryCacheImpl.cache, you can see a request is made a second time to
> see if the value is in the cache. More than likely this time the value will
> be found in the cache, in which case true is returned. If not, false is
> returned. Either way, it doesn't matter if true or false is returned. To
> see why, look at org.apache.openjpa.persistence.QueryImpl.preExecute. This
> is the code which has called PreparedQueryCacheImpl.register (see above
> stack). Again, this might be more clear when looking at the provided test in
> the debugger. You can see in
> 'org.apache.openjpa.persistence.QueryImpl.preExecute' that if anything other
> than null is returned from 'register', the code that calls
> PreparedQuery.reparametrize is skipped! This is at the heart of the issue.
> To understand why, you need to go back to JIRA OPENJPA-2609 and understand
> the part about PreparedSQLStoreQuery$PreparedSQLExecutor, or understand the
> fact that a PreparedSQLStoreQuery$PreparedSQLExecutor is the executor. That
> is, the threads involved here (again, easier to see in the provided test) are
> using a PreparedSQLStoreQuery$PreparedSQLExecutor. Given this the parameter
> must be reparametrize, in other words the parameter needs to be converted to
> its parameter index (an int). Because this is skipped, we see the CCEx when
> trying to convert a string parameter to an expected Integer. That is, when
> you look at this line in org.apache.openjpa.kernel.QueryImpl.execute:
> StoreQuery.Executor ex = (isInMemory(operation))
> The next couple lines below this is where the CCEx occurs, because the
> parameters have not been converted. Note that when we are dealing with a
> 'ExpressionStoreQuery$DataStoreExecutor' (i.e. the very first time a query is
> executed - not cached - or caching is disabled), the reparamaterization
> doesn't need to occur because an ''ExpressionStoreQuery$DataStoreExecutor'
> expects a string parameters, rather than a paramater (int) index. If you run
> with the debug module I provide (see below) I add debug to print the
> 'executor' making is a little easier to see when each of these two executors
> are used.
> Now, that describes the CCEx, but I've said nothing about the NPE. Well, if
> you followed the above, as I'll explain, the NPE is very easy to see. Again,
> just like the CCEx stack, the NPE stack is in the
> org.apache.openjpa.persistence.QueryImpl.preExecute method. Just as was the
> case above for the CCEx, this method goes to PreparedQueryCacheImpl.register.
> This time though, the 'register' returns null, meaning the PQ is already in
> the cache. A few lines after the 'register' call, we see these lines:
> PreparedQuery pq = _em.getPreparedQuery(_id);
> if (pq.isInitialized()) {
> The code is going back to the cache for the actual PQ. Using the same "cache
> miss" scenario as described above, a null PQ will be returned.....obviously
> when we call pq.isInitialized() on a null, an NPE will occur!
> While this JIRA describes one way to cause the CCEx, I feel there are other
> holes that could cause the CCEx, or other exceptions. For example, if you
> look at 'org.apache.openjpa.persistence.QueryImpl.ignorePreparedQuery()', you
> can see that it goes to the cache and if an entry is not found, it returns
> without doing the recompile. I see a scenario where this is a problem. That
> is, taking a look once again at 'QueryImpl.preExecute'. Before the call to
> 'PreparedQueryCacheImpl.register' there is a test for null parameters. If
> there is a null parameter, the 'ignorePreparedQuery()' method is called. If
> you step into this method you can see that a call is made to the cache. If
> we go through the same "cache miss" scenario, we will see that an "ignore" is
> not performed (by "ignore" it appears the Query is recompiled and in so doing
> the 'PreparedSQLStoreQuery$PreparedSQLExecutor' would be removed and instead
> a 'ExpressionStoreQuery$DataStoreExecutor' used). When the "cache miss"
> occurs and the "ignore" not performed, we can see that yet again the call to
> PreparedQuery.reparametrize is skipped, thus leading to the same CCEx. I
> think we need to fix the case where a "cache miss" could occur in CacheMap,
> or look at all the cases where the cache is called and account for the
> possibility of a "cache miss". The former is likely the best options.
> Finally, let me describe the test I've created and attached. It is named
> PreparedQueryCacheTest.java and uses the provided entity (MyEntity.java).
> This test can be ran as a stand alone JSE JUnit test. To execute it simply
> point your classpath to the OpenJPA snapshot jar of your choice. You can use
> a simple persistence.xml file (no properties are necessary). To make it much
> easier for the test to recreate the issue, you will need to edit the
> PreparedQueryCacheImpl.java class. Therefore this does assume the user can
> edit and recompile the OpenJPA code. The user should change the CacheMap in
> PreparedQueryCacheImpl to:
> _delegate = new CacheMap(false,2);
> from:
> _delegate = new CacheMap();
> Again, doing this makes it far more likely the issue will occur and is a
> necessity when running the debug test. If you look at the test there is a
> test method to run when running in a debugger, and a test method to run to
> automatically recreate the exception. Each test method is commented with
> enough text to explain how to recreate the exception. Especially the debug
> test, the comments in this method detail exactly where to add break points
> and the necessary flow to recreate the issue in a debugger. With my test
> you'll see a CCEx like this:
> Thread (Thread-32): Caught the following exception:
> <openjpa-2.2.3-SNAPSHOT-r422266:1737410 nonfatal user error>
> org.apache.openjpa.persistence.ArgumentException: Failed to execute query
> "select o from MyEntity o where o.comments = :mrn and o.id = 1". Check the
> query syntax for correctness. See nested exception for details.
> With cause: java.lang.ClassCastException: java.lang.String cannot be cast
> to java.lang.Integer
> In some cases you may see the NPE reported in the beginning of the JIRA.
> However, the CCEx occurs more often than the NPE.
> In addition to the test, I've attached a debug module which has system
> println's to help demonstrate the issue. The debug module is named
> PreparedQueryTest.debug.
> Thanks,
> Heath Thomann
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)