[
https://issues.apache.org/jira/browse/OPENJPA-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627354#action_12627354
]
Pinaki Poddar commented on OPENJPA-703:
---------------------------------------
> I just don't like seeing the SQL generation process taking over 30% of my
> program's execution time when I simply execute the SAME QUERY OBJECT over and
> over again.
Agreed. The issue with the proposed solution, however, is that the query
instances are bound to a persistence context (aka. EntityManager) -- unless one
is reusing the same EntityManager - an unlikely scenario in JEE container
environment -- associating generated SQL with persistence context is not going
to be fruitful as the context will live within a single transaction and then
disappear.
I have now introduced a mechanism similar to your proposed idea to cache the
SQLs generated by OpenJPA. The cached SQL is indexed by the original JPQL or
NamedQuery moniker at EntityManagerFactory level.
The advantages of the approach are manifold:
1. this caching mechanism caches *any* query. Not only findBy() as is the
case with an earlier attempt to SQL level query caching.
2. it provides a longer life span of the cache and hence same JPQL query Q
when executed by different EntityManagers can be now reused efficiently.
3. the key for the query is natural and indirectly specified by the user to
be unique as the key is the JPQL itself or the NamedQuery moniker. So there is
neither any overhead of computing the query key nor there is any code coupling
as context of what makes a query unique grows.
4. the mechanism is orthogonal and unobtrusive to OpenJPA kernel as well as
the user. The JPQL queries during first execution gets translated to SQL and
any subsequent execution of the query within the same persistent unit (not
context) reuse the generated SQL directly, bypassing the entire query formation
(that 30% you mentioned) logic of the kernel.
5. The critical (and slightly hairy) issue is the parameter binding. As JPQL
allows positional as well as named parameter binding but SQL allows only
positional binding, hence a bit of care must be exercised to ensure that
parameter binding remains consistent as we translate a JPQL to SQL under the
hood.
Initial performance result with this simple mechanics is promising and
presented below:
1. All measurements are taken on my laptop
2. Logging was turned off
3. The database was empty -- so the query gets executed but no actual record
is selected. This is done purposefully to accentuate the incremental benefit of
query construction vs query caching.
4. All measurements are taken by 100 repeated executions of each query
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Query
Time taken in ms %Improvement
without cache with cache
t1 t2
PCT[(t1-t2)/t1]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1. select p from PObject p
625
578 +7%
SELECT t0.id, t0.version, t0.name FROM PObject t0
2. select p from PObject p where p.name = 'PObject'
562 516
+8%
SELECT t0.id, t0.version, t0.name FROM PObject t0 WHERE (t0.name =
'PObject')
3. select p from PObject p where p.name = :param
531 500
+6%
SELECT t0.id, t0.version, t0.name FROM PObject t0 WHERE (t0.name = ?)
4. select e from Employee e where e.name = :emp and e.department.name = :dept
594 546
+8%
and e.department.company.name = :company and e.address.zip =
:zip
SELECT t0.id, t3.id, t3.city, t3.state, t3.street, t3.zip, t1.id, t2.id,
t2.name, t1.name, t0.name
FROM Employee t0 INNER JOIN Department t1 ON t0.DEPARTMENT_ID = t1.id
INNER JOIN Address t3 ON t0.ADDRESS_ID = t3.id
INNER JOIN Company t2 ON t1.COMPANY_ID = t2.id
WHERE (t0.name = ? AND t1.name = ? AND t2.name = ? AND t3.zip = ?)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The current limitation is:
a) queries that result into more than one SQL statements (e.g. union or
parallel queries) are not amenable to caching.
b) no cache invalidation on fetch plan changes -- but the existing caching
does not address that issue either.
An earlier attempt to caching SQL query (as configured by
jdbc.QuerySQLCache) is based on associating a SQL string to the
implementation-internal Select objects and caching the Select objects
themselves. This current work does overlap with this effort and hence some
explanation is called for on why I have considered an alternative
implementation for SQL query caching while one already exists. My primary
concern with existing implementation for query caching are as follows:
a) it is non-orthogonal to other design constructs. For example, when a
relation is being loaded (a very critical and common code logic) the code
becomes aware whether query caching is in effect and, more critically, takes a
distinctly different path.
b) the code logic of JDBCStoreManager -- another critical component of
OpenJPA -- also becomes aware of query caching and starts bifurcating code
paths. This approach can lead to poor maintainability and consistency in future.
c) it only works for findBy() and trying to generalize it for other
queries will spread the code to many other mapping strategies -- accentuating
the above mentioned design concerns.
d) it does need to compute a Select key and also caches a Select -- a
very heavy object -- causing a poor memory usage profile.
I did not find any test case that measure performance gain of this existing
approach. Can anyone (whoever has cared to read up to this point:) please point
me to any performance result obtained with existing caching implementation so
that I can do a fair comparative performance evaluation of the existing
approach and this current one?
> Cache ResultObjectProvider data to improve query performance
> ------------------------------------------------------------
>
> Key: OPENJPA-703
> URL: https://issues.apache.org/jira/browse/OPENJPA-703
> Project: OpenJPA
> Issue Type: Improvement
> Components: kernel
> Reporter: Ron Pressler
>
> Profiling indicated that JDBCStoreQuery.populateSelect consumes a significant
> amount of CPU, and is executed every time a query is run. While, in fact, the
> actual PreparedStatement is created and run only in QueryImpl.toResult. It
> seems like the returned ResultObjectProvider from JDBCStoreQuery.executeQuery
> can be at least partially cached, or even cached in its entirety (provided
> care is taken with the context parameters).
> It seems like such an improvement would significantly improve query
> performance.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.