[ 
https://issues.apache.org/jira/browse/OPENJPA-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579938#comment-16579938
 ] 

ASF subversion and git services commented on OPENJPA-2646:
----------------------------------------------------------

Commit 1838026 from dazey...@apache.org in branch 'openjpa/branches/2.4.x'
[ https://svn.apache.org/r1838026 ]

OPENJPA-2646: Performance Regression Fix

> Sporadic NullPointerException occurs under heavy load when QuerySQLCache is 
> enabled.
> ------------------------------------------------------------------------------------
>
>                 Key: OPENJPA-2646
>                 URL: https://issues.apache.org/jira/browse/OPENJPA-2646
>             Project: OpenJPA
>          Issue Type: Bug
>          Components: query
>    Affects Versions: 2.2.3, 2.4.1
>            Reporter: Heath Thomann
>            Assignee: Will Dazey
>            Priority: Major
>             Fix For: 2.2.3, 2.4.3, 3.0.0
>
>         Attachments: MyEntity.java, OPENJPA-2646-2.2.x.patch, 
> Perf_Regression.patch, PreparedQueryCacheTest.java, PreparedQueryTest.debug
>
>
> Hello!  This is a follow up/continuation of JIRA OPENJPA-2609.  I will assume 
> the reader slogged through that one as some of the info in that JIRA is very 
> relevant here.  The fix for that JIRA was necessary to plug a hole in 
> PreparedQuery (PQ) cache.  The fix did reduce my customer's occurrence of the 
> exception by around 90%.  However, they still saw the reported exception and 
> in addition they see following exception:
> Caused by: java.lang.NullPointerException
> at org.apache.openjpa.persistence.QueryImpl.preExecute(QueryImpl.java:557)
> at org.apache.openjpa.persistence.QueryImpl.execute(QueryImpl.java:285)
> at org.apache.openjpa.persistence.QueryImpl.getResultList(QueryImpl.java:302)
> at com.ibm.ws.persistence.QueryImpl.getResultList(QueryImpl.java:118)
> .....
>  
> As I mentioned in OJ2609, I was not able to exactly recreate the issue.  I 
> described some of the strategic break points I had to use, and the 
> assumptions I had to make.  Alas, I was able to recreate this issue with a 
> multi threaded test, and specific query string (among other things)!  This 
> JIRA will take the reader on another journey through the laborious process I 
> went through to recreate the issue, and finally track down the very, I mean 
> very, narrow window where the CCEx or NPE can occur.  So, settle in and lets 
> start our fun adventure!  
> First, let me give an overview of the issue and look at the code to 
> illustrate how the exceptions can occur.  After that I'll dive into the 
> recreate.  
> The exception can occur when an item (a query string mapped to a PQ in my 
> case) in the CacheMap is moved from the  "soft" cache to the main cache.  
> Please view the CacheMap code to see what I mean by the main cache (see 
> variable 'cacheMap') and what I mean by the soft cache (see variable 
> 'softMap').  When an item exists in the soft cache, and a thread requests 
> that item, the item is moved out of the soft cache and put in the main cache. 
>  If the main cache is full, another item is kicked out of the main cache.  
> The item which is removed from the main cache is put into a soft cache.  As 
> this move from soft cache to main cache occurs, there is a narrow window 
> where the item appears to not be in the caches at all.  In other words, if a 
> (reader) thread asks (CacheMap.get) for item X, where another (writer) thread 
> is moving item X from the soft cache to the main cache, the reader thread 
> will not see item X and think the item is not cached.
> To see this better, lets look at CacheMap.get:    
>  public Object get(Object key) {
>         readLock();
>         try {
>             // Check the main map first
>             Object  val = cacheMap.get(key);
>             if (val == null) {
>                 // if we find the key in the soft map, move it back into
>                 // the primary map
>                 val = softMap.get(key);
>                 if (val != null){
>                     put(key, val);
>                 }else{
>                     val = pinnedMap.get(key);
>                 }
>             }
>            
>             return val;
>         } finally {
>             readUnlock();
>         }
>     }
> As you can see, we first look in the main cache for a given key (in my 
> scenario the key is a query string).  If it is not found, we look in the soft 
> map.  If it is in the soft map, we call the 'put' method.  The 'put' method 
> is going to take the key out of the soft map and put it in the main cache 
> map.  Let me paste the relevant parts of the put method (I'll leave it to the 
> reader to view the method in its entirety):
> public Object put(Object key, Object value) {
>         writeLock();
>         try {
> ........
>             // otherwise, put the value into the map and clear it from the
>             // soft map
>             val = put(cacheMap, key, value);
>             if (val == null) {
>                 val = remove(softMap, key);
> ........
>         } finally {
>             writeUnlock();
>         }
> As you can see here, this 'put' method calls to another 'put' method which 
> simply puts the key/value into the specified map.  If the map didn't already 
> contain a value for the given key (which I think should always be the case 
> for the cacheMap, or at least for my scenario), null will be returned.  When 
> null is returned, remove is called to remove the key/value from the soft map. 
>  
> Given this, lets look closer at the 'get' method with concurrency in mind.  
> Take the case where two threads enter the 'get' method at the same time.  
> Next, assume both threads are moving lock step and they both call 
> 'cacheMap.get' with the same key, and assume the key is not in the main 
> cache.  So at this time both threads will have a 'val' of null.  Next assume 
> one thread gets focus (so the other thread suspends for a moment) and it 
> calls the soft map with the key.  Assume the soft map contains the key.  In 
> this case the thread will move (via the call to 'put') the key/value from the 
> soft cache to the main cache.  This thread will then have a non-null 'val' 
> and will return.  Next assume the suspended thread gets focus and continues.  
> Since its 'val' is null, it will ask the soft cache for the key.  However, we 
> know that the other thread just moved the key/value from the soft to main 
> cache!  As such, the thread will not find the key in the soft cache and will 
> return a 'val' of null!!!  This is at the heart of the issue!  That is, one 
> thread thinks there is nothing in the cache for their key!  
> The observant reader may notice that the 'get' and 'put' have a 'readLock' 
> and 'writeLock', respectively, which attempts to get a ReentrantLock.  So one 
> would think that there is proper read/write access of the caches to avoid 
> such situations.  However, one must look at the read/writeLock methods.  If 
> you look at the readLock method, you can see that it locks using a 
> '_readLock' variable.  This variable is null for the CacheMap instance 
> created by the PreparedQueryCacheImpl.  Therefore, the readLock method is a 
> no-op.  This means that a reader thread can do a read at any time without any 
> locking concerns.  
> Now that we know the heart of the issue (i.e. one thread received an 
> incorrect cache miss), lets look at the implication of this incorrect cache 
> miss.  What I haven't stated is where the threads are at when they call 'get' 
> on the cache.  This is easier to see when looking at the attached test.  
> Basically though one thread is in PreparedQueryCacheImpl.register, 
> specifically this line:
> PreparedQuery cached = get(id);
> Here is the stack from my test which I will describe:
> PreparedQueryCacheImpl.get(String) line: 193  
> PreparedQueryCacheImpl.register(String, Query, FetchConfiguration) line: 92   
> QueryImpl<X>.preExecute(Map) line: 554        
> If you look at the 'register' method, you can see that if the PreparedQuery 
> is not found in the cache (i.e. 'get(id)' returns null), one will be created 
> and an attempt made to put it into the cache.  If you look at  
> PreparedQueryCacheImpl.cache, you can see a request is made a second time to 
> see if the value is in the cache.  More than likely this time the value will 
> be found in the cache, in which case true is returned.  If not, false is 
> returned.  Either way, it doesn't matter if true or false is returned.  To 
> see why, look at org.apache.openjpa.persistence.QueryImpl.preExecute.  This 
> is the code which has called PreparedQueryCacheImpl.register (see above 
> stack).  Again, this might be more clear when looking at the provided test in 
> the debugger.  You can see in 
> 'org.apache.openjpa.persistence.QueryImpl.preExecute' that if anything other 
> than null is returned from 'register', the code that calls 
> PreparedQuery.reparametrize is skipped!  This is at the heart of the issue.  
> To understand why, you need to go back to JIRA OPENJPA-2609 and understand 
> the part about PreparedSQLStoreQuery$PreparedSQLExecutor, or understand the 
> fact that a PreparedSQLStoreQuery$PreparedSQLExecutor is the executor. That 
> is, the threads involved here (again, easier to see in the provided test) are 
> using a PreparedSQLStoreQuery$PreparedSQLExecutor.  Given this the parameter 
> must be reparametrize, in other words the parameter needs to be converted to 
> its parameter index (an int).  Because this is skipped, we see the CCEx when 
> trying to convert a string parameter to an expected Integer.  That is, when 
> you look at this line in org.apache.openjpa.kernel.QueryImpl.execute:
> StoreQuery.Executor ex = (isInMemory(operation))
> The next couple lines below this is where the CCEx occurs, because the 
> parameters have not been converted.  Note that when we are dealing with a 
> 'ExpressionStoreQuery$DataStoreExecutor' (i.e. the very first time a query is 
> executed - not cached - or caching is disabled), the reparamaterization 
> doesn't need to occur because an ''ExpressionStoreQuery$DataStoreExecutor' 
> expects a string parameters, rather than a paramater (int) index.  If you run 
> with the debug module I provide (see below) I add debug to print the 
> 'executor' making is a little easier to see when each of these two executors 
> are used.
> Now, that describes the CCEx, but I've said nothing about the NPE.  Well, if 
> you followed the above, as I'll explain, the NPE is very easy to see.  Again, 
> just like the CCEx stack, the NPE stack is in the 
> org.apache.openjpa.persistence.QueryImpl.preExecute method.  Just as was the 
> case above for the CCEx, this method goes to PreparedQueryCacheImpl.register. 
>  This time though, the 'register' returns null, meaning the PQ is already in 
> the cache.  A few lines after the 'register' call, we see these lines:
> PreparedQuery pq = _em.getPreparedQuery(_id);
>     if (pq.isInitialized()) {
> The code is going back to the cache for the actual PQ.  Using the same "cache 
> miss" scenario as described above, a null PQ will be returned.....obviously 
> when we call pq.isInitialized() on a null, an NPE will occur!
> While this JIRA describes one way to cause the CCEx, I feel there are other 
> holes that could cause the CCEx, or other exceptions.  For example, if you 
> look at 'org.apache.openjpa.persistence.QueryImpl.ignorePreparedQuery()', you 
> can see that it goes to the cache and if an entry is not found, it returns 
> without doing the recompile.  I see a scenario where this is a problem.  That 
> is, taking a look once again at 'QueryImpl.preExecute'.  Before the call to 
> 'PreparedQueryCacheImpl.register' there is a test for null parameters.  If 
> there is a null parameter, the 'ignorePreparedQuery()' method is called.  If 
> you step into this method you can see that a call is made to the cache.  If 
> we go through the same "cache miss" scenario, we will see that an "ignore" is 
> not performed (by "ignore" it appears the Query is recompiled and in so doing 
> the 'PreparedSQLStoreQuery$PreparedSQLExecutor' would be removed and instead 
> a 'ExpressionStoreQuery$DataStoreExecutor' used).  When the "cache miss" 
> occurs and the "ignore" not performed, we can see that yet again the call to 
> PreparedQuery.reparametrize is skipped, thus leading to the same CCEx.  I 
> think we need to fix the case where a "cache miss" could occur in CacheMap, 
> or look at all the cases where the cache is called and account for the 
> possibility of a "cache miss".  The former is likely the best options.
> Finally, let me describe the test I've created and attached.  It is named 
> PreparedQueryCacheTest.java and uses the provided entity (MyEntity.java).  
> This test can be ran as a stand alone JSE JUnit test.  To execute it simply 
> point your classpath to the OpenJPA snapshot jar of your choice.  You can use 
> a simple persistence.xml file (no properties are necessary).  To make it much 
> easier for the test to recreate the issue, you will need to edit the 
> PreparedQueryCacheImpl.java class.  Therefore this does assume the user can 
> edit and recompile the OpenJPA code.  The user should change the CacheMap in 
> PreparedQueryCacheImpl to:
> _delegate = new CacheMap(false,2);
> from:
> _delegate = new CacheMap();
> Again, doing this makes it far more likely the issue will occur and is a 
> necessity when running the debug test.  If you look at the test there is a 
> test method to run when running in a debugger, and a test method to run to 
> automatically recreate the exception.  Each test method is commented with 
> enough text to explain how to recreate the exception.  Especially the debug 
> test, the comments in this method detail exactly where to add break points 
> and the necessary flow to recreate the issue in a debugger.  With my test 
> you'll see a CCEx like this:
> Thread (Thread-32): Caught the following exception: 
> <openjpa-2.2.3-SNAPSHOT-r422266:1737410 nonfatal user error> 
> org.apache.openjpa.persistence.ArgumentException: Failed to execute query 
> "select o from MyEntity o where o.comments = :mrn and o.id = 1". Check the 
> query syntax for correctness. See nested exception for details.
>   With cause: java.lang.ClassCastException: java.lang.String cannot be cast 
> to java.lang.Integer
> In some cases you may see the NPE reported in the beginning of the JIRA.  
> However, the CCEx occurs more often than the NPE.  
> In addition to the test, I've attached a debug module which has system 
> println's to help demonstrate the issue.  The debug module is named 
> PreparedQueryTest.debug. 
> Thanks,
> Heath Thomann



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to