[ https://issues.apache.org/jira/browse/OPENJPA-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579938#comment-16579938 ]
ASF subversion and git services commented on OPENJPA-2646: ---------------------------------------------------------- Commit 1838026 from dazey...@apache.org in branch 'openjpa/branches/2.4.x' [ https://svn.apache.org/r1838026 ] OPENJPA-2646: Performance Regression Fix > Sporadic NullPointerException occurs under heavy load when QuerySQLCache is > enabled. > ------------------------------------------------------------------------------------ > > Key: OPENJPA-2646 > URL: https://issues.apache.org/jira/browse/OPENJPA-2646 > Project: OpenJPA > Issue Type: Bug > Components: query > Affects Versions: 2.2.3, 2.4.1 > Reporter: Heath Thomann > Assignee: Will Dazey > Priority: Major > Fix For: 2.2.3, 2.4.3, 3.0.0 > > Attachments: MyEntity.java, OPENJPA-2646-2.2.x.patch, > Perf_Regression.patch, PreparedQueryCacheTest.java, PreparedQueryTest.debug > > > Hello! This is a follow up/continuation of JIRA OPENJPA-2609. I will assume > the reader slogged through that one as some of the info in that JIRA is very > relevant here. The fix for that JIRA was necessary to plug a hole in > PreparedQuery (PQ) cache. The fix did reduce my customer's occurrence of the > exception by around 90%. However, they still saw the reported exception and > in addition they see following exception: > Caused by: java.lang.NullPointerException > at org.apache.openjpa.persistence.QueryImpl.preExecute(QueryImpl.java:557) > at org.apache.openjpa.persistence.QueryImpl.execute(QueryImpl.java:285) > at org.apache.openjpa.persistence.QueryImpl.getResultList(QueryImpl.java:302) > at com.ibm.ws.persistence.QueryImpl.getResultList(QueryImpl.java:118) > ..... > > As I mentioned in OJ2609, I was not able to exactly recreate the issue. I > described some of the strategic break points I had to use, and the > assumptions I had to make. Alas, I was able to recreate this issue with a > multi threaded test, and specific query string (among other things)! This > JIRA will take the reader on another journey through the laborious process I > went through to recreate the issue, and finally track down the very, I mean > very, narrow window where the CCEx or NPE can occur. So, settle in and lets > start our fun adventure! > First, let me give an overview of the issue and look at the code to > illustrate how the exceptions can occur. After that I'll dive into the > recreate. > The exception can occur when an item (a query string mapped to a PQ in my > case) in the CacheMap is moved from the "soft" cache to the main cache. > Please view the CacheMap code to see what I mean by the main cache (see > variable 'cacheMap') and what I mean by the soft cache (see variable > 'softMap'). When an item exists in the soft cache, and a thread requests > that item, the item is moved out of the soft cache and put in the main cache. > If the main cache is full, another item is kicked out of the main cache. > The item which is removed from the main cache is put into a soft cache. As > this move from soft cache to main cache occurs, there is a narrow window > where the item appears to not be in the caches at all. In other words, if a > (reader) thread asks (CacheMap.get) for item X, where another (writer) thread > is moving item X from the soft cache to the main cache, the reader thread > will not see item X and think the item is not cached. > To see this better, lets look at CacheMap.get: > public Object get(Object key) { > readLock(); > try { > // Check the main map first > Object val = cacheMap.get(key); > if (val == null) { > // if we find the key in the soft map, move it back into > // the primary map > val = softMap.get(key); > if (val != null){ > put(key, val); > }else{ > val = pinnedMap.get(key); > } > } > > return val; > } finally { > readUnlock(); > } > } > As you can see, we first look in the main cache for a given key (in my > scenario the key is a query string). If it is not found, we look in the soft > map. If it is in the soft map, we call the 'put' method. The 'put' method > is going to take the key out of the soft map and put it in the main cache > map. Let me paste the relevant parts of the put method (I'll leave it to the > reader to view the method in its entirety): > public Object put(Object key, Object value) { > writeLock(); > try { > ........ > // otherwise, put the value into the map and clear it from the > // soft map > val = put(cacheMap, key, value); > if (val == null) { > val = remove(softMap, key); > ........ > } finally { > writeUnlock(); > } > As you can see here, this 'put' method calls to another 'put' method which > simply puts the key/value into the specified map. If the map didn't already > contain a value for the given key (which I think should always be the case > for the cacheMap, or at least for my scenario), null will be returned. When > null is returned, remove is called to remove the key/value from the soft map. > > Given this, lets look closer at the 'get' method with concurrency in mind. > Take the case where two threads enter the 'get' method at the same time. > Next, assume both threads are moving lock step and they both call > 'cacheMap.get' with the same key, and assume the key is not in the main > cache. So at this time both threads will have a 'val' of null. Next assume > one thread gets focus (so the other thread suspends for a moment) and it > calls the soft map with the key. Assume the soft map contains the key. In > this case the thread will move (via the call to 'put') the key/value from the > soft cache to the main cache. This thread will then have a non-null 'val' > and will return. Next assume the suspended thread gets focus and continues. > Since its 'val' is null, it will ask the soft cache for the key. However, we > know that the other thread just moved the key/value from the soft to main > cache! As such, the thread will not find the key in the soft cache and will > return a 'val' of null!!! This is at the heart of the issue! That is, one > thread thinks there is nothing in the cache for their key! > The observant reader may notice that the 'get' and 'put' have a 'readLock' > and 'writeLock', respectively, which attempts to get a ReentrantLock. So one > would think that there is proper read/write access of the caches to avoid > such situations. However, one must look at the read/writeLock methods. If > you look at the readLock method, you can see that it locks using a > '_readLock' variable. This variable is null for the CacheMap instance > created by the PreparedQueryCacheImpl. Therefore, the readLock method is a > no-op. This means that a reader thread can do a read at any time without any > locking concerns. > Now that we know the heart of the issue (i.e. one thread received an > incorrect cache miss), lets look at the implication of this incorrect cache > miss. What I haven't stated is where the threads are at when they call 'get' > on the cache. This is easier to see when looking at the attached test. > Basically though one thread is in PreparedQueryCacheImpl.register, > specifically this line: > PreparedQuery cached = get(id); > Here is the stack from my test which I will describe: > PreparedQueryCacheImpl.get(String) line: 193 > PreparedQueryCacheImpl.register(String, Query, FetchConfiguration) line: 92 > QueryImpl<X>.preExecute(Map) line: 554 > If you look at the 'register' method, you can see that if the PreparedQuery > is not found in the cache (i.e. 'get(id)' returns null), one will be created > and an attempt made to put it into the cache. If you look at > PreparedQueryCacheImpl.cache, you can see a request is made a second time to > see if the value is in the cache. More than likely this time the value will > be found in the cache, in which case true is returned. If not, false is > returned. Either way, it doesn't matter if true or false is returned. To > see why, look at org.apache.openjpa.persistence.QueryImpl.preExecute. This > is the code which has called PreparedQueryCacheImpl.register (see above > stack). Again, this might be more clear when looking at the provided test in > the debugger. You can see in > 'org.apache.openjpa.persistence.QueryImpl.preExecute' that if anything other > than null is returned from 'register', the code that calls > PreparedQuery.reparametrize is skipped! This is at the heart of the issue. > To understand why, you need to go back to JIRA OPENJPA-2609 and understand > the part about PreparedSQLStoreQuery$PreparedSQLExecutor, or understand the > fact that a PreparedSQLStoreQuery$PreparedSQLExecutor is the executor. That > is, the threads involved here (again, easier to see in the provided test) are > using a PreparedSQLStoreQuery$PreparedSQLExecutor. Given this the parameter > must be reparametrize, in other words the parameter needs to be converted to > its parameter index (an int). Because this is skipped, we see the CCEx when > trying to convert a string parameter to an expected Integer. That is, when > you look at this line in org.apache.openjpa.kernel.QueryImpl.execute: > StoreQuery.Executor ex = (isInMemory(operation)) > The next couple lines below this is where the CCEx occurs, because the > parameters have not been converted. Note that when we are dealing with a > 'ExpressionStoreQuery$DataStoreExecutor' (i.e. the very first time a query is > executed - not cached - or caching is disabled), the reparamaterization > doesn't need to occur because an ''ExpressionStoreQuery$DataStoreExecutor' > expects a string parameters, rather than a paramater (int) index. If you run > with the debug module I provide (see below) I add debug to print the > 'executor' making is a little easier to see when each of these two executors > are used. > Now, that describes the CCEx, but I've said nothing about the NPE. Well, if > you followed the above, as I'll explain, the NPE is very easy to see. Again, > just like the CCEx stack, the NPE stack is in the > org.apache.openjpa.persistence.QueryImpl.preExecute method. Just as was the > case above for the CCEx, this method goes to PreparedQueryCacheImpl.register. > This time though, the 'register' returns null, meaning the PQ is already in > the cache. A few lines after the 'register' call, we see these lines: > PreparedQuery pq = _em.getPreparedQuery(_id); > if (pq.isInitialized()) { > The code is going back to the cache for the actual PQ. Using the same "cache > miss" scenario as described above, a null PQ will be returned.....obviously > when we call pq.isInitialized() on a null, an NPE will occur! > While this JIRA describes one way to cause the CCEx, I feel there are other > holes that could cause the CCEx, or other exceptions. For example, if you > look at 'org.apache.openjpa.persistence.QueryImpl.ignorePreparedQuery()', you > can see that it goes to the cache and if an entry is not found, it returns > without doing the recompile. I see a scenario where this is a problem. That > is, taking a look once again at 'QueryImpl.preExecute'. Before the call to > 'PreparedQueryCacheImpl.register' there is a test for null parameters. If > there is a null parameter, the 'ignorePreparedQuery()' method is called. If > you step into this method you can see that a call is made to the cache. If > we go through the same "cache miss" scenario, we will see that an "ignore" is > not performed (by "ignore" it appears the Query is recompiled and in so doing > the 'PreparedSQLStoreQuery$PreparedSQLExecutor' would be removed and instead > a 'ExpressionStoreQuery$DataStoreExecutor' used). When the "cache miss" > occurs and the "ignore" not performed, we can see that yet again the call to > PreparedQuery.reparametrize is skipped, thus leading to the same CCEx. I > think we need to fix the case where a "cache miss" could occur in CacheMap, > or look at all the cases where the cache is called and account for the > possibility of a "cache miss". The former is likely the best options. > Finally, let me describe the test I've created and attached. It is named > PreparedQueryCacheTest.java and uses the provided entity (MyEntity.java). > This test can be ran as a stand alone JSE JUnit test. To execute it simply > point your classpath to the OpenJPA snapshot jar of your choice. You can use > a simple persistence.xml file (no properties are necessary). To make it much > easier for the test to recreate the issue, you will need to edit the > PreparedQueryCacheImpl.java class. Therefore this does assume the user can > edit and recompile the OpenJPA code. The user should change the CacheMap in > PreparedQueryCacheImpl to: > _delegate = new CacheMap(false,2); > from: > _delegate = new CacheMap(); > Again, doing this makes it far more likely the issue will occur and is a > necessity when running the debug test. If you look at the test there is a > test method to run when running in a debugger, and a test method to run to > automatically recreate the exception. Each test method is commented with > enough text to explain how to recreate the exception. Especially the debug > test, the comments in this method detail exactly where to add break points > and the necessary flow to recreate the issue in a debugger. With my test > you'll see a CCEx like this: > Thread (Thread-32): Caught the following exception: > <openjpa-2.2.3-SNAPSHOT-r422266:1737410 nonfatal user error> > org.apache.openjpa.persistence.ArgumentException: Failed to execute query > "select o from MyEntity o where o.comments = :mrn and o.id = 1". Check the > query syntax for correctness. See nested exception for details. > With cause: java.lang.ClassCastException: java.lang.String cannot be cast > to java.lang.Integer > In some cases you may see the NPE reported in the beginning of the JIRA. > However, the CCEx occurs more often than the NPE. > In addition to the test, I've attached a debug module which has system > println's to help demonstrate the issue. The debug module is named > PreparedQueryTest.debug. > Thanks, > Heath Thomann -- This message was sent by Atlassian JIRA (v7.6.3#76005)