https://issues.apache.org/bugzilla/show_bug.cgi?id=46962

Alexis Giotis <alex.gio...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27342|0                           |1
        is obsolete|                            |
  Attachment #27343|0                           |1
        is obsolete|                            |
  Attachment #27357|0                           |1
        is obsolete|                            |

--- Comment #7 from Alexis Giotis <alex.gio...@gmail.com> 2011-08-05 20:31:09 
UTC ---
Created attachment 27358
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27358
Patch with rewritten PropertyCache (trunk revision 1154337)

In short, the new PropertyCache implementation attached is:
- Up to 3 times faster (depending on the tests, the -Xmx, and the retention or
not of strong refs to he cached entries)
- 3 times less lines of code
- Obviously thread-safe
- Written using JDK5 generics
- Has similar memory requirements

Additionally this patch fixes broken hashCode() and equals() methods of classes
extending Property (including the patch in bug 51625). Those caused in tests
many hashCode collisions.



In more detail, I wrote 2 new implementations of PropertyCache (one in this
patch and the one in attachment 27357) and tested both against the original one
with the fix contained in attachment 27343.

When strong references to the cached entries are kept, then the performance of
all is similar. When they are not (more common case), the ones based on the
concurrent hash map are up to 3 times faster. The tests were allocating 1M
(million) Property instances from which 100K were equal (but different
instances).  

The first implementation based on the concurrent map supports caching not-equal
objects with the same hashcode but is fairly complex. The one attached in the
final patch does not. After some experimentation and tests with large (1000
page) documents hashcode collisions were caused due to buggy hashCode and equal
implementations. Handling this case has a performance penalty. In a test that
caches 1M entries from which there are only 100 different hashCodes the time to
complete was:

52 seconds for the initial implementation
12 seconds the the concurrent map that can cache not-equal objects with the
same hashcode.
1 second for the concurrent map that keeps the more recent one.

In other words, in this case (which is due to buggy hashcode()/equals), the
cost of creating a new instance and replacing the previously cached one is by
far smaller that the one to maintain in memory the different instances.

Note that this implementation does not provide any guarantee related to the
uniqueness of equal instances when concurrently executed. Such a guarantee is
not only complex to code but also it requires additional locking. In practice,
it is not very probable that this will happen, finally there will be only one
and of course this should be a tolerable situation. After all, the caching can
be globally disabled with the same system property as before.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to