[Xmlgraphics-fop Wiki] Update of "PropertyHandling/PropertyCache" by AndreasDelmelle

Apache Wiki Sun, 11 May 2008 03:10:31 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Xmlgraphics-fop Wiki" 
for change notification.


The following page has been changed by AndreasDelmelle:
http://wiki.apache.org/xmlgraphics-fop/PropertyHandling/PropertyCache

------------------------------------------------------------------------------
  
  and the generated instances only have to be moved from the stack to the heap 
if they are really ''new'' instances to be added to the map, the involved 
overhead should be minimal in a language like Java, especially with modern VMs. 
The well-known warning about instantiation rates in the early days now more 
apply to circumstances where one would trigger a chain of constructors with 
every instantion (e.g. if the second assignment would be {{{this.referenceValue 
= new OtherClass();}}}).
  
- 
- == Structure ==
- 
- As the number of properties to which this approach was applied, began to 
grow, the decision was made to implement a dedicated class that could be used 
by any of the {{{Property}}} classes. The concern for multi-session 
environments was first taken into account by wrapping the {{{WeakHashMap}}} in 
a standard synchronized map. The {{{PropertyCache}}} itself would only have one 
or more {{{fetch(Object obj)}}} methods as public access points, that took care 
of the operations that were initially performed within the {{{Property}}} 
subclasses (see above), so the idiom had become simply: 
+ As the number of objects to which this approach was applied, began to grow, 
the decision was made to implement a dedicated class that could be used by any 
of the {{{Property}}} classes. The concern for multi-session environments was 
first taken into account by wrapping the {{{WeakHashMap}}} in a standard 
synchronized map. The {{{PropertyCache}}} itself would only have one or more 
{{{fetch(Object obj)}}} methods as public access points, that took care of the 
operations that were initially performed within the {{{Property}}} subclasses 
(see above), so the idiom had become simply: 
  
  {{{
  public static Property getInstance(int someInt, Object someObj) {
@@ -58, +55 @@

  
  In the meantime development went further on the trunk. After reading some 
more documentation on multi-threading in Java and studying the concepts of 
concurrency and contention, it became obvious that there was a possible 
performance bottleneck involved in using {{{Collections.synchronizedMap();}}}. 
Java 1.5 has its own {{{java.util.concurrent}}} package, but FOP still targets 
1.4 environments, so those could not be used. Even then, the 
{{{util.concurrent}}} package only offers a {{{ConcurrentHashMap}}}, no variant 
using {{{WeakReference}}}s, which would be needed to implement a cache that is 
shared by many threads. Besides that, a {{{Map}}} always contains ''mappings''. 
That is: every entry has a key ''and'' a value. But FOP needed only a weakly 
referenced key...
  
+ == Structure ==
+ 
  The following article at IBM gave a starting point for building our very own 
cache: 
  http://www.ibm.com/developerworks/java/library/j-jtp08223/index.html
  
+ As pointed out at the start of the article, this is not for the faint of 
heart. Maybe we shouldn't have tried it at home, but we did anyway.
+ 
+ Thinking it over a bit more, it became apparent that the cache did not need 
to expose a public {{{remove()}}} operation at all, which already made it a bit 
simpler. It didn't need to expose anything but the mentioned {{{fetch()}}}. 
Since the approach was based on {{{WeakReference}}}, all we needed to do was 
make sure the map never grew unnecessarily large. The weak references would be 
cleared automatically. The reference objects themselves were removed from the 
cache upon each {{{put()}}}. Initially, rather audaciously, this was done in 
separate threads (as if implementing our own {{{ConcurrentWeakHashSet}}} was 
not yet enough... ;-)) Pretty soon, it became obvious that also using threading 
internally in the map was taking it one step too far. The cleanup cycle was 
moved, the threading abandoned, and a {{{ReferenceQueue}}} introduced.
+ 
+ The private {{{cleanSegment()}}} method is triggered upon each {{{put()}}}, 
but only if the map segment in question grows beyond twice the number of 
buckets in the map. Starting with a map of 8 buckets, the cleanup will be 
attempted every time a map segment grows beyond 16 elements.
+ 
+ A segment is simply an object on which can be synchronized to lock a portion 
of the buckets (see also the IBM article), and whose element-count corresponds 
to the total for all those buckets. That element-count is updated with each 
{{{put()}}}, and during the cleanup. At first, each segment corresponds to 1 
bucket, since the map only contains 8, but once the number of buckets grows 
beyond 32, there will be multiple buckets per segment.
+ 
+ The cleanup does no more than check the reference queue of the segment in 
question for stale entries, and if there are any, removes the corresponding 
references from the cache.
+ Upon return of the cleanup, there is a check on whether it had any effect, 
and the number of elements for the segment has decreased. If not, then a 
boolean switch is set, indicating that the map segment in question votes for a 
redistribution of the elements. If more than 8 of the map segments agree, a 
{{{rehash()}}} is performed, which doubles the amount of buckets to 16. Since 
each bucket is its own segment, the first rehash will only be triggered if all 
of the buckets grow beyond size 16. So in that case, the cache would contain 
more than 128 elements. The second rehash then, will be triggered once 8 of the 
16 buckets grows beyond size 32, or a total size of over 256 elements. The 
third when 8 of 32 grow over 64, so 512 elements. The fourth when 8 of 32 grow 
over 128, or 1024.
+ 
+ Note that 1024 property instances corresponding to distinct values is in most 
general use-cases already more than sufficient to cover the distinct values for 
all those properties ''together''. The caches are divided over the applicable 
{{{Property}}} subclasses, so it will likely never get to a fifth, which would 
mean 2K distinct values for one property type.
+ 
+ Since a rehash needs exclusive access to the map, this is done by recursively 
obtaining locks on all of the segments. The initial entry of the {{{rehash()}}} 
method obtains the first lock, then calls itself, and does this until all locks 
are acquired. From that point on, no other thread can get access to any of the 
segments, and will have to wait until the rehash is completed. That is: each 
thread that would need to put a new instance in the cache. Calls to {{{get()}}} 
can still be served, if a corresponding instance already exists, since it first 
tries the usual way, without synchronization.
+ 
+ The {{{fetch()}}} method has been exposed in such a way that it only has 
public entry points for those types useful in this context, mostly for 
convenience. Calls to those are routed to the private generic method by casting 
the parameter instance to {{{Object}}}, which triggers a {{{get()}}} on the 
internal map. If there is already a weak reference to an equal instance, we get 
a hard reference to that one.
+ 
+ The approach has been applied to a number of {{{Property}}} subtypes, roughly 
corresponding to those types of property values that can be resolved at 
parse-time (absolute lengths, strings, numbers, enums...), and to 
property-bundles consisting solely of such properties ({{{CommonFont}}} and 
{{{CommonHyphenation}}}).
+ 
+ (to be continued)
+ 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[Xmlgraphics-fop Wiki] Update of "PropertyHandling/PropertyCache" by AndreasDelmelle

Reply via email to