Simon, I've got some time reviewing your proposal. It looks very feasible. Probably you'd better read my blog entry on WeakRef and Finalizer processing of Harmony [1].
One question, will you implement weakref support for all the three concurrent GC algorithm? Btw, please go ahead to submit/update your proposal to the GSoC application web. [1] http://xiao-feng.blogspot.com/2007/05/weak-reference-processing-in-apache.html Thanks, xiaofeng On Tue, Mar 31, 2009 at 6:06 PM, Simon Zhou <[email protected]> wrote: > Hi All, > Here is my proposal, any suggestion is welcome! > > *Title/Summary:* GC-1: Implement WeakReference support in Harmony Concurent > GC > > *Student:* Simon.Zhou > > *Student e-mail:* [email protected] > > *Student Major:* Computer Science > > *Student Degree:* Master > > *Student Graduation:* 2009.7 > > *Organization:* Fudan University > > *Assigned Mentor:* > > *Abstract:* > > weak reference allows a program to maintain special reference to objects, > which is a useful API for the program interacting with the garbage > collector to some extent. Apache Harmony has a concurrent garbage collector > (code name: Tick) which can perform the garbage collection without > suspending the application threads except for root set enumeration. Tick do > not have weak reference supporting now, it treats all the references object > just as strong references, which may cause inefficiency for applications > using weak references. In this project, I will add the weak reference > support for Tick, which would be different from the implementation in > generational GC, because the consistency should be maintained for the heap > objects carefully. Read barrier of the get() method of reference object will > be used for this implementation, and the performance issue will be seriously > considered. before implement phantom reference, I will also implement the > finalizer feature for Tick, although it is not recommanded to be used except > it is necessary. Besides this, I am trying to reduce the floating garbage > for Tick’s concurrent algorithms. > > *Detailed Description:* > > Tick is a concurrent GC of Harmony DRLVM, it can do the garbage collection > without suspending application threads. For concurrent GC, the most > important issue that should be dealt with is the inconsistency problem, that > is if we trace the heap object graph while the mutator updating the same > graph, inconsistency problem occurs, some live objects may be mistaken > relaimed as garbage. In order to keep the consistency of objects, write > barriers is inserted when concurrent tracing starts, which intercept all the > write operations of object reference and record useful data for rescaning. > Now there are 3 different concurrent collection algorithms in Tick: DLG, > Sliding-View, Mostly-Concurrent; and 3 different write barriers are used > repectively. see more information [1]. > > Experiment result shows that Tick can significantly reduce or even eliminate > the pause time caused by garbage collection. However, one drawback of Tick > is that, it do not support weak reference mechanism, which is a important > API of Java programming language. > > Weak reference is an useful tool for whom are much care about the > application memory usage, e.g. it can help developers to cache some > temporary large objects for later reusing; also, to do some cleanup > operations just when a object is acturally relaimed. There are 3 kinds of > weak reference class in Java language (the weakness is decreasingly): soft > reference, weak reference, phantom reference. Soft reference is usually used > to maintain better cache for large object, such as pictures load from web; > weak reference can help to reduce the frequency of creating short-live > objects; phantom reference can usully help programmer to do cleanup > operations just when an object is physically relaimed. (from now on, I will > use ‘weakref’ to represent the set of all the three kinds of weak > references) All the three kinds of reference classes have a get() method, if > the object has not been actually relaimed, the get() method will return a > reference of the referent object (referent object is the object which > reference object point to) except phantom reference (phantom reference’s > get() method always return null, since it is just used for cleanup > operations when object is actually relaimed). see more information [2,3] > > Weakrefs are always processed in GC. For generational GC, weakref feature > can be implemented easily: when minor collection occurs, weak reference > objects are equeued and their referents are reclaimed. when major collection > occurs, soft reference objects are enqueued and their referents are > reclaimed. when finalizer are processed, phantom reference objects are > enqueued and their referents are reclaimed. > > Unlike generational GC, Tick just simply treats all the references in the > heap as strong references, which make memory pressure become bigger for the > applications using much data structure based on weakrefs, such as > WeakHashMap. In this project, my job is to implement this weakref feature > for Tick, and this work should be independent to the concurrent algorithms > used in Tick. > > For concurrent GC, weakref processing is different with generational GC, > because the tracing thread is running concurrently with application threads. > Application threads may invoke the get() method to get the reference of > referent object, it may: > 1, put it to the thread local stack as local var or calling parameter, such > as > Object localvar = weakref.get(); > functionA(weakref.get()); > 2, assign it to another object slot, such as > obj.field = weakref.get(); > these operations will change the reachability of referent object, which may > cause the inconsistency issues in concurrent GC. In order to deal with it, > what we should do is to intercept these operations while application thread > is running. Because all these operations use the get() method to get the > reference of referent object, inserting the read barrier in the get() is a > simple and effective approach. The main job of the read barrier is to add > the pointer of referent object to a remembered set, then the garbage > collection thread will pick it up and trace from it. > > I divided the work into 3 major steps: > > The first step is to implement the read barrier for get() method of > reference object. My approch is implementing the barrier using a VMHelper > method, which is a VMMagic based helper function writen in Java. That is, > using Java to write some hot funtions in JVM, then these Java methods can be > compilered and even inlined into the native code produced by JIT compiler > [4]. This is very useful approach to reduce the read barrier’s overhead. In > this step, I will write a Java version read barrier and add some code to the > JIT compiler (Jitrino and JET) to make it compile the barrier correctly and > effiently. > > The second step is to implement the weakref processing module in Tick. This > module will collect all the weakrefs while concurrent marking and put them > to different queues depending on its reference type. When concurrent marking > finishes, GC threads pick up the reference object in the queues to see if > the referent object is marked, if so, the referent object is live and should > not be processed, otherwise, process the referent object according the type > of the reference object respectively. moreover, if the reference object is > intercepted by the read barrier in the get() method, its referent object > should be marked while tracing, so the consistency is well maintained. > > After this step, I will test Tick with benchmark using weakrefs, or using > data structure based on the weakrefs (such as WeakHashMap). The experiment > result will be analyzed and compared to Tick without weakrefs supporting and > generational GC with weakrefs supporting. Then I will form the result data > to a document report. > > The last step is trying to reduce the floating garabge [5] for concurrent > GC. This work is not related to weakrefs feature, but its effect may be > similar to it, since the major goal of reducing floating garbage is to ease > the memory pressure while using concurrent GC. Floating garbage always exist > and may not be totally eliminated in concurrent GC, my job is trying to > optimize its impact as much as possible. > > *Additional Information:* > > Personal Information: > I am a master canidate of Computer Science in Parallel Processing > Institution, Fudan University. My interested area includes Compiler, > Computer Architecure, Managed Runtime etc. I started to follow Harmony with > interests 2 year ago, and I used to contribute to DRLVM and have submited > some patches to the community. As an intern, I improved Harmony concurrent > GC by using state-based phase design and implementing an efficient scheduler > in the summer of 2008, the patch of this work has been merged to the source > trunk [6]. > > Schedule: > > I can work at least 3 days per week for this work, my schedule this > project as follow: > April 10 ~ April 26 Communicate with the mentor and design the interface > of the implementation, such as, the native interface of get() method, weak > reference processing interfaces in GC module. Write down the design > document. > > April 27 ~ May 24 implementing read barrier for get() method of reference > object. > > May 25 ~ June 30 implementing the weak and soft reference processing > features in GC module, add finalizer feature to Tick > > July 1 ~ July 15 implementing phantom reference processing features. > > July 16 ~ Aug 1 test and analyze Tick with weak reference related > benchmark. write document for implementing and testing. > > Aug 1 ~ Aug 15 try to improve Tick by reducing floating garbage. > > *References:* > > [1] > http://xiao-feng.blogspot.com/2007/03/comparison-between-snapshot-based-gc.html > [2] http://www.realjenius.com/node/377 > [3] http://www.pawlan.com/Monica/refobjs/ > [4] > http://mail-archives.apache.org/mod_mbox/harmony-dev/200505.mbox/%[email protected]%3e > [5] http://www.memorymanagement.org/glossary/f.html#floating.garbage > [6] http://issues.apache.org/jira/browse/HARMONY-5989 > > -- > From : simon.z...@ppi, Fudan University > -- http://people.apache.org/~xli
