[jira] Commented: (OPENJPA-703) Cache ResultObjectProvider data to improve query performance

Pinaki Poddar (JIRA) Sun, 31 Aug 2008 15:28:43 -0700

    [ 
https://issues.apache.org/jira/browse/OPENJPA-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627354#action_12627354
 ]


Pinaki Poddar commented on OPENJPA-703:
---------------------------------------

> I just don't like seeing the SQL generation process taking over 30% of my 
> program's execution time when I simply execute the SAME QUERY OBJECT over and 
> over again. 
Agreed. The issue with the proposed solution, however, is that the query 
instances are bound to a persistence context (aka. EntityManager) -- unless one 
is reusing the same EntityManager - an unlikely scenario in JEE container 
environment -- associating generated SQL with persistence context is not going 
to be fruitful as the context will live within a single transaction and then 
disappear.

I have now introduced a mechanism similar to your proposed idea to cache the 
SQLs generated by OpenJPA. The cached SQL is  indexed by the original JPQL or 
NamedQuery moniker at EntityManagerFactory level. 

The advantages of the approach are manifold:
   1. this caching mechanism caches *any* query. Not only findBy() as is the 
case with an earlier attempt to SQL level query caching. 
   2. it provides a longer life span of the cache and hence same JPQL query Q 
when executed by different EntityManagers can be now reused efficiently. 
   3.  the key for the query is natural and indirectly specified by the user to 
be unique as the key is the JPQL itself or the NamedQuery moniker. So there is 
neither any overhead of computing the query key nor there is any code coupling 
as context of what makes a query unique grows.
   4. the mechanism is orthogonal and unobtrusive to OpenJPA kernel as well as 
the user. The JPQL queries during first execution gets translated to SQL and 
any subsequent execution of the query within the same persistent unit (not 
context) reuse the generated SQL directly, bypassing the entire query formation 
(that 30% you mentioned) logic of the kernel. 
   5. The critical (and slightly hairy) issue is the parameter binding. As JPQL 
allows positional as well as named parameter binding but SQL allows only 
positional binding, hence a bit of care must be exercised to ensure that 
parameter binding remains consistent as we translate a JPQL to SQL under the 
hood.

Initial performance result with this simple mechanics is promising and 
presented below:
  1. All measurements are taken on my laptop 
  2. Logging was turned off
  3. The database was empty -- so the query gets executed but no actual record 
is selected. This is done purposefully to accentuate the incremental benefit of 
query construction vs query caching. 
  4. All measurements are taken by 100 repeated executions of each query 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       Query                                                                    
                                                                                
      Time taken  in ms                                          %Improvement
                                                                                
                                                                                
  without cache      with cache
                                                                                
                                                                                
              t1                  t2                                            
   PCT[(t1-t2)/t1]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1. select p from PObject p                                                      
                                                                          625   
           578                                              +7%
    SELECT t0.id, t0.version, t0.name FROM PObject t0 

2. select p from PObject p where p.name = 'PObject'                             
                                                   562              516         
                                      +8%
    SELECT t0.id, t0.version, t0.name FROM PObject t0 WHERE (t0.name = 
'PObject')

3. select p from PObject p where p.name = :param                                
                                                   531               500        
                                      +6%
    SELECT t0.id, t0.version, t0.name FROM PObject t0 WHERE (t0.name = ?)

4. select e from Employee e where e.name = :emp and e.department.name = :dept   
                      594             546                                       
        +8%
                and e.department.company.name = :company and e.address.zip = 
:zip
    SELECT t0.id, t3.id, t3.city, t3.state, t3.street, t3.zip, t1.id, t2.id, 
t2.name, t1.name, t0.name 
        FROM Employee t0 INNER JOIN Department t1 ON t0.DEPARTMENT_ID = t1.id 
        INNER JOIN Address t3 ON t0.ADDRESS_ID = t3.id 
        INNER JOIN Company t2 ON t1.COMPANY_ID = t2.id 
        WHERE (t0.name = ? AND t1.name = ? AND t2.name = ? AND t3.zip = ?)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

The current limitation is:
  a) queries that result into more than one SQL statements (e.g. union or 
parallel queries) are not amenable to caching.
  b) no cache invalidation on fetch plan changes -- but the existing caching 
does not address that issue either. 



   An earlier attempt to caching SQL query (as configured by 
jdbc.QuerySQLCache) is based on associating a SQL string to the 
implementation-internal Select objects and caching the Select objects 
themselves. This current work does overlap with this effort and hence some 
explanation is called for on why I have considered an alternative 
implementation for SQL query caching while one already exists. My primary 
concern with existing implementation for query caching are as follows:
       a) it is non-orthogonal to other design constructs. For example, when a 
relation is being loaded (a very critical and common code logic) the code 
becomes aware whether query caching is in effect and, more critically, takes a 
distinctly different path.
       b) the code logic of JDBCStoreManager -- another critical component of 
OpenJPA -- also becomes aware of query caching and starts bifurcating code 
paths. This approach can lead to poor maintainability and consistency in future.
       c) it only works for findBy() and trying to generalize it for other 
queries will spread the code to many other mapping strategies -- accentuating 
the above mentioned design concerns.
       d) it does need to compute a Select key and also caches a Select -- a 
very heavy object -- causing a poor memory usage profile. 

   I did not find any test case that measure performance gain of this existing 
approach. Can anyone (whoever has cared to read up to this point:) please point 
me to any performance result obtained with existing caching implementation so 
that I can do a fair comparative performance evaluation of the existing 
approach and this current one?

 

> Cache ResultObjectProvider data to improve query performance
> ------------------------------------------------------------
>
>                 Key: OPENJPA-703
>                 URL: https://issues.apache.org/jira/browse/OPENJPA-703
>             Project: OpenJPA
>          Issue Type: Improvement
>          Components: kernel
>            Reporter: Ron Pressler
>
> Profiling indicated that JDBCStoreQuery.populateSelect consumes a significant 
> amount of CPU, and is executed every time a query is run. While, in fact, the 
> actual PreparedStatement is created and run only in QueryImpl.toResult. It 
> seems like the returned ResultObjectProvider from JDBCStoreQuery.executeQuery 
> can be at least partially cached, or even cached in its entirety (provided 
> care is taken with the context parameters). 
> It seems like such an improvement would significantly improve query 
> performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (OPENJPA-703) Cache ResultObjectProvider data to improve query performance

Reply via email to