Folks, 

Yes, I do have a newer version of the source file that produced the
GC-howto document currently on the site - see attached. 

I am not sure we actually need to sync up with the original
Asciidoc-input file. Will send a separate email with my thoughts soon.

Thank you, 
Nadya Morozova
 

-----Original Message-----
From: news [mailto:[EMAIL PROTECTED] On Behalf Of Salikh Zakirov
Sent: Tuesday, October 17, 2006 1:29 PM
To: harmony-dev@incubator.apache.org
Subject: Re: "Hot to Write GC" requires improvement

Svetlana,

I've looked through your changes.
Mostly they look okay, and they greatly improve the visual presentation.

Originally, GC-howto was created using AsciiDoc[1] toolchain from the
source
text file and source .cpp file. Modifying .html file directly means that
we cannot
update the document to keep it in sync with the source code.

I guess this is acceptable, since nobody is changing source code inlets
in GC-howto now,
but be warned: if anyone is to introduce source changes, it would
tedious
task to synchronize visual and content changes.

Have you tried to configure asciidoc to produce the content you want?

I will send you the version of gc-howto.txt and gc.cpp that I have,
but Nadya may have a later version, so please check with her.
Since I am not sure attachment will make it to the list, I'll send it to
you
directly. (* or to anyone else who might be interested, just ask *)

[1] http://www.methods.co.nz/asciidoc/

Konovalova, Svetlana wrote:
> Sorry about that! 
> I've created a new patch, hope it's the right one you need. Please let
> me know if you still have any problems. 
> 
> [JIRA 1881] http://issues.apache.org/jira/browse/HARMONY-1881
> 
> Cheers,
> Sveta Konovalova
> 
> -----Original Message-----
> From: Geir Magnusson Jr. [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, October 17, 2006 8:50 AM
> To: harmony-dev@incubator.apache.org
> Subject: Re: "Hot to Write GC" requires improvement
> 
> The problem with the patch is that it's to the rendered output
> 
>     site/xdoc/subcomponent/drlvm/gc-howto.html
> 
> when what we need is the patch to the source document
> 
>     site/xdoc/subcomponent/drlvm/gc-howto-content.html
> 
> Can you add a new patch with that please?
> 
> geir
> 
> Rana Dasgupta wrote:
>> This is a good document, thanks Svetlana. Even if a lot of custom
gc's
> 
>> don't
>> get written, it helps in understanding the current collecor farmework
> and
>> how it plugs into DRLVM.
>>
>> Rana
>>
>>
>>
>>> On 10/16/06, Konovalova, Svetlana <[EMAIL PROTECTED]>
> wrote:
>>>>
>>>> Folks,
>>>>
>>>> I took a close look at "Hot to Write GC" [1] and created a patch
> for
>>>> this doc [JIRA 1881]. I fixed formatting, brushed up the code,
> removed
>>>> out-dated tags etc.
>>>> It would be great if someone can find a chance to look at the
> patch.
>>>> Thanks in advance!
>>>>
>>>> [1]
>>>>
> http://incubator.apache.org/harmony/subcomponents/drlvm/gc-howto.html
>>>> [JIRA 1881] http://issues.apache.org/jira/browse/HARMONY-1881
>>>>
>>>>
>>>> Cheers,
>>>> Sveta Konovalova
>>>>
>>>>
> 
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

How to write DRL GC
===================
[EMAIL PROTECTED], [EMAIL PROTECTED]
revision 1.0


//
//  Copyright 2006 The Apache Software Foundation or its licensors, as 
applicable.
//
//  Licensed under the Apache License, Version 2.0 (the "License");
//  you may not use this file except in compliance with the License.
//  You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
//  Unless required by applicable law or agreed to in writing, software
//  distributed under the License is distributed on an "AS IS" BASIS,
//  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
//  See the License for the specific language governing permissions and
//  limitations under the License.
//
//
//  Generate HTML version of this document from text source
//  and configuration file GC-howto.conf using the command
//
//      asciidoc -f GC-howto.conf --unsafe GC-howto.txt
//
//  Download Asciidoc generic distribution archive from
//
//      http://www.methods.co.nz/asciidoc/downloads.html
//
//  unpack it somewhere (e.g. /usr/local/opt/asciidoc), and
//  symlink /usr/local/opt/asciidoc/asciidoc.py to /usr/local/bin/asciidoc
//

This section provides instructions on creating a custom garbage collector 
implementation
in C++ and configuring the DRL virtual machine to use it. The section describes
the major steps of this procedure, namely: 

- Establishing the build infrastructure
- Implementing the GC interface
- Implementing the garbage collector algorithm
- Running the VM with the custom GC

.Note
Plugging-in a user-designed garbage collector presupposes an operating DRL
virtual machine built according to the instructions of the README.txt file
supplied with the VM source package. 

Establishing the build infrastructure
-------------------------------------

At this stage, you create the directory and set up the build infrastructure to
build the dynamic library. At the end of this stage, you will be fully set for
adding the garbage collector code and building it.

DRLVM can load a custom garbage collector from a dynamic library. It is
recommended that you build your dynamic library using a DRLVM build
infrastructure. Below is an example of creating of a
build descriptor on the Windows<<LegalText,*>> / IA-32 architecture.

Create a directory for a new GC module, for example:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-----
vm$ mkdir gc_copying
vm$ mkdir gc_copying/src
vm$ cd gc_copying/src
-----
That is where you will put the source code, see Section 3, 
Implementing the garbage collector algorithm. 

Create a build descriptor file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Create the build descriptor file gc_copying.xml
with the following content:
-----
sys::[sed -n '/<project/,$p' gc_copying.xml]
-----

You can add other macro definitions, include directories or compiler-specific 
command-line options to match your needs.

Create a C++ file with essential includes, namely:  
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-----
sys::[sed -n '/^#include "open/,/^#include "cxxlog/p'  gc.cpp]
-----

These include files are located in directories vm/include/open and
vm/port/include. Consult their content for documentation and details of the
interface.

Test the configuration
~~~~~~~~~~~~~~~~~~~~~~

Run the build system to test whether the infrastructure is set up correctly:
-----
build$ build.bat -DCOMPONENTS=vm.gc_copying
-----

On a successful build, the .dll file is placed to the VM build directory
build/win_ia32_icl_debug/deploy/jre/bin/. The name of the directory may differ
depending on your system and the compiler used. This empty library will not
work, you have to write your GC first!


Implementing the GC interface
-----------------------------

This section lists the functions that a garbage collector interface must
implement. Declarations of these functions are in gc.h. For details, consult
the Developer's Guide and documentation in gc.h and vm_gc.h. 

GC lifecycle
~~~~~~~~~~~~

* gc_init() initializes the garbage collector
* gc_wrapup() shuts down the GC 
* gc_vm_initialized() notifies the GC about the VM transition from the
  initialization stage to running user applications
* gc_thread_init() and gc_thread_kill() notify the GC about creation and
  termination of user threads that may request memory allocation or other GC
  services
* gc_class_prepared() notifies the GC about loaded and prepared classes

Object allocation
~~~~~~~~~~~~~~~~~

* gc_alloc() performs slow allocation, can trigger garbage collection
* gc_alloc_fast() performs faster allocation, should not trigger garbage
  collection
* gc_add_root_set_entry() is responsible for enumeration
* gc_add_root_set_entry() enumerates one root pointer
 
See the Root set enumeration section in the Developer's Guide for details.

Miscellaneous
~~~~~~~~~~~~~

* gc_supports_compressed_references() indicates whether GC supports compressed 
references 
* gc_is_object_pinned() indicates whether the GC will move an object or not
* gc_force_gc() forces a garbage collection

Optional
~~~~~~~~

The virtual machine can operate without the functions listed below, but certain 
features
will be unavailable.

* gc_free_memory() returns the estimated amount of memory available for 
allocation
* gc_pin_object()  requests that the GC does not move an object
* gc_unpin_object()  removes the restriction on not moving an object
* gc_get_next_live_object()  iterates over live objects during the 
stop-the-world
  phase of garbage collection
* gc_finalize_on_exit()  transfers finalizable queue contents to the VM core on
  shutdown
* gc_time_since_last_gc() returns the amount of time that elapsed since the
  previous collection, in milliseconds
* gc_total_memory() returns the overall amount of memory used for the 
Java<<LegalText,*>> heap
* gc_max_memory() returns the maximum amount of memory that can be used for the
  Java<<LegalText,*>> heap

The `VM_GC` interface
~~~~~~~~~~~~~~~~~~~~~

The garbage collector requires VM support in its operation. The virtual machine
exports the VM_GC  interface to meet the needs of the garbage collector.
Besides, the GC uses the VM_common interface.

The VM_GC interface describes the services that the VM provides specifically
for the garbage collector. Please refer to the header file vm_gc.h to see the
complete list and documentation. 

The VM exports two functions to provide the global locking service for the
garbage collector: vm_gc_lock_enum() and vm_gc_unlock_enum(). These two
functions differ from plain system locks in their ability to gracefully
interoperate with VM threading services. In case of contention on the GC lock,
that is, when multiple threads call vm_gc_lock_enum() simultaneously, one
thread gets the lock, and others remain blocked. If the thread that grabbed the 
GC
lock does a garbage collection, the blocked threads are considered safely
suspended.  Other ways to lock user threads for a long time can lead to a
deadlock because the VM will have no way to find out whether the thread is 
blocked
or running.

A detailed description of GC procedure is given in the Developers' Guide.

DRLVM provides two functions to support thread suspension and root set
enumeration simultaneously:

* vm_enumerate_root_set_all_threads() suspends all user threads and initiates
  root set enumeration
* vm_resume_threads_after() resumes execution of user threads 

These functions effectively restrict the garbage collector to stop-the-world 
algorithms only.

Implementing the garbage collector algorithm
--------------------------------------------

This section gives step-by-step instructions on how to implement the garbage
collection algorithm. The example shows a semispace copying collector.

.Note
This example does not implement object finalization and weak references.


[[gc_algorithm]]
Algorithm Overview
~~~~~~~~~~~~~~~~~~

The heap is divided into two equally sized contiguous semispaces. 
During normal operation, only one semispace is used ('current semispace'),
and the other one is 'reserved' for garbage collection.
Allocation requests are satisfied by contiguous allocation
from the current semispace. Each application thread reserves a thread-local
allocation buffer ('TLAB') under a global lock, and serves most of the 
allocation
requests without locking, by incrementing the allocation pointer local
to the buffer. 

When the application requests an allocation that does not fit into the remaining
free space of the current semispace, a garbage collection is initiated. The 
current
semispace becomes the 'evacuation space' ('fromspace'), and the reserved 
semispace
becomes the 'destination space' ('tospace'). The VM suspends all application 
threads and
enumerates root references. 

The GC copies the objects reachable from root references to the destination 
space. 
When an object is copied from evacuation space to destination space, the GC 
installs 
the forwarding pointer in the old copy. Root references are updated to point 
to new object locations.

After the root set enumeration is complete, the GC scans objects in the
destination space. Each reached object is copied to the destination space,
the forwarding pointer is installed in the old copy, and the scanned object 
fields are
updated. For objects with forwarding pointers installed, the GC updates object 
fields. 
In this way, the GC ensures that all live objects are copied to the destination 
space exactly once.

The destination space serves as a queue of objects to be scanned when
more and more objects are copied to the destination space during heap
traversal. Once all live objects are reached and copied, the scan
queue stops growing, and the GC updates object fields only during
the last part of the scanning process.

The GC completes the scanning process when the scan pointer reaches the 
allocation
pointer in the destination space. At this stage, all live objects have been
evacuated to the destination space, and the evacuation space can be safely 
reclaimed.
The GC then changes the semispace roles: it uses the destination space for 
further allocation
and reserves the evacuation space for the next garbage collection. The change of
the semispace roles is commonly referred to as 'flip'.

After the semispace flip, the GC resumes user threads.

Please refer to the excellent survey for detailed description
of this algorithm and other basic garbage collection techniques,
"Uniprocessor Garbage Collection Techniques", Paul R. Wilson.

Source code explained
~~~~~~~~~~~~~~~~~~~~~

The full source code of the collector is available in gc.cpp.


The structure `TLS` (thread-local storage)
is used for the 'optimizing fast path' allocation. The GC allocates
a buffer of free space from the heap with appropriate locking and further uses
this buffer for thread-local allocation.

-----
sys::[sed -n '/^\/\/ This structure is allocated for each user thread/,/^}/p'  
gc.cpp]
-----

Define the main GC structure to contain the Java<<LegalText,*>> heap and the 
data necessary
for GC operation, as shown below. 

-----
sys::[sed -n '/^\/\/ The structure GC encapsulates all GC data/,/}/p'  gc.cpp]
-----

The following structure stores object information: the object field layout and
the object size.
-----
sys::[sed -n '/^\/\/ Structure OI (Object information)/,/^}/p'  gc.cpp]
-----

The data stored in the `OI` structure is initialized and accessed by the GC 
only.

The following structures convey the static assumptions that GC makes about
object layout. The VM must use the same object layout assumptions for the
correct GC operation.

The `VTable` structure contains the virtual table of the object methods,
and is linked from the object header. The VM reserves some space (at least 4 
bytes)
for exclusive use by GC. The GC uses 4 bytes of GC-private space to put the 
pointer
to the object information structure `struct OI`.
------
sys::[sed -n '/^\/\/ The VTable structure has 4 bytes reserved/,/^}/p'  gc.cpp]
------

The GC assumes that each Java<<LegalText,*>> object has a fixed header: (1) a 
pointer
to the `VTable` structure, and then a (2) 32 bit word with flags.
The 25 highest bits are used by the VM Thread Manager component to
implement Java monitors and 7 lowest bits are used by GC and for
storing the object hash code.
------
sys::[sed -n '/^\/\/ This structure describes object header format/,/^}/p'  
gc.cpp]
------

The array objects have the same header, and a 4 byte length field
at the offset 8.
------
sys::[sed -n '/^\/\/ This strucutre describes array header format/,/^}/p'  
gc.cpp]
-----

.Note
The layout described is valid for the IA-32 platform only. 

A number of convenience functions use object layout knowledge to perform
various data manipulations. The function init_vt() writes the VTable pointer
to an object.

-----
sys::[sed -n '/^void init_vt/,/^}/p'  gc.cpp]
-----
The function obj_oi() retrieves object information structure
pointer from an object.
-----
sys::[sed -n '/^OI *\* *obj_oi/,/^}/p'  gc.cpp]
-----
The function array_length() retrieves the length of an array
object.
-----
sys::[sed -n '/^int array_length/,/^}/p'  gc.cpp]
-----
The function vt_oi() retrieves  the `OI` structure pointer
from the VTable pointer.
-----
sys::[sed -n '/^OI *\* *vt_oi/,/^}/p'  gc.cpp]
-----
The function ah_oi() retrieves the `OI` structure pointer
using `Allocation_Handle`. On 32-bit architectures, the
VTable pointer is a 32-bit pointer, and Allocation_Handle is a 32-bit
integer.
-----
sys::[sed -n '/^OI *\* *ah_oi/,/^}/p'  gc.cpp]
-----

The object_size() function computes the size of an object. Array size is
calculated by summing the header size and the element size multiplied by array
length. Afterwards the size is aligned to be multiple of 4. The non-array
object size is cached in the OI structure.
-----
sys::[sed -n '/^int object_size/,/^}/p'  gc.cpp]
-----
In this example, the garbage collector is created statically as a global
instance of structure GC:
-----
GC gc;
-----

The function init() statically configures size parameters. Normally, this
function uses the function vm_get_property() to read configuration options
specified as property values on the command line. In this example, we use
constant values for simplicity.
-----
sys::[sed -n '/^void GC::init/,/chunk_size/p'  gc.cpp]
-----

As the next step, the init() function allocates space for the heap, divides it
into two semispaces, and initializes the allocation semispace.
------
sys::[sed -n '/^ *space = /,/^}/p'  gc.cpp]
-----


The global allocation function uses a lock to protect the heap from
simultaneous access from multiple threads. The locking mechanism
is trivially implemented in a platform-dependent way. See the full source code 
in gc.cpp.
-----
sys::[sed -n '/^byte *\* *GC::galloc *(/,/^}/p'  gc.cpp]
-----

The local allocation function uses the thread-local allocation area for object
allocation, and uses galloc() to allocate a new chunk for a thread-local
allocation area as needed.

-----
sys::[sed -n '/^byte *\* *GC::alloc *(/,/^}/p'  gc.cpp]
-----

The forwarding pointers are installed in the lockword structure, the second word
of an object.
-----
sys::[sed -n '/^byte *\* *GC::forwarded *(/,/^}/p'  gc.cpp]
-----

The function move() copies the object data to the evacuation semispace and
installs the forwarding pointer in the old object copy.
-----
sys::[sed -n '/^byte *\* *GC::move *(/,/^}/p'  gc.cpp]
-----


The function root() handles one root during root set enumeration. If the root
points to an object already reached, the root is updated with the forwarded
pointer value. Otherwise, the GC moves the object to the destination space and
installs the forwarding pointer in the old object copy.
-----
sys::[sed -n '/^void GC::root *(/,/^}/p'  gc.cpp]
-----

The function trace() scans one object.
-----
sys::[sed -n '/^void GC::trace *(/,/^}/p'  gc.cpp]
-----

The function collect_alloc() is the main function controlling garbage
collection. This function reclaims unused memory and the retries the 
allocation. The GC
attempts to allocate the memory before resuming other threads. This prevents the
thread that triggered the garbage collection from starving. 

.Note
The thread is 'starving' when it gets no resources for a long time
because other threads grab the resource before it can even try.
If the garbage collector resumes user threads before retrying the allocation,
these threads may use all available space quickly before the allocation 
succeeds. 
In this case, the allocation will fail for an indefinite number of times. 

-----
sys::[sed -n '/^byte\* GC::collect_alloc/,/^}/p'  gc.cpp]
-----

The exported GC interface is mostly implemented by delegating the task to the
method of the structure `GC`. The GC initialization function init() is called
from gc_init().
-----
sys::[sed -n '/^void gc_init/,/^}/p'  gc.cpp]
-----

Thread local allocation areas are reset on thread creation and thread
termination events.
-----
sys::[sed -n '/^void gc_thread_init/,/^}/p'  gc.cpp]

sys::[sed -n '/^void gc_thread_kill/,/^}/p'  gc.cpp]
-----

The slow path allocation function gc_alloc() checks whether the allocation 
space is
exhausted and starts garbage collection when necessary.
-----
sys::[sed -n '/^Managed_Object_Handle gc_alloc *(/,/^}/p'  gc.cpp]
-----

If the memory is exhausted, the no-collection allocation function
gc_alloc_fast() returns NULL, and does not start garbage collection.

-----
sys::[sed -n '/^Managed_Object_Handle gc_alloc_fast/,/^}/p'  gc.cpp]
-----

The root set enumeration function passes the root reference to the root()
function.
-----
sys::[sed -n '/^void gc_add_root_set_entry *(/,/^}/p'  gc.cpp]
-----

The function build_slot_offset_array() is used to construct a NULL-terminated
list of offsets of reference fields.
-----
sys::[sed -n '/^static int *\* *build_slot_offset_array/,/^}/p'  gc.cpp]
-----

The GC caches object layout information when the function gc_class_prepared()
is called.
------
sys::[sed -n '/^void gc_class_prepared/,/^}/p'  gc.cpp]
-----
The function gc_force_gc() starts a forced garbage collection using the global
GC lock to ensure that only one thread is doing a collection at any time. It
passes null arguments to collect_alloc(), because it requires no
allocation.
-----
sys::[sed -n '/^void gc_force_gc/,/^}/p'  gc.cpp]
-----

Other functions of the GC interface are empty or trivial, and not described in
this document. You can see the full listing in the gc.cpp file.

After you completed coding the garbage collector, you can build a GC dynamic
library, as described above, by typing 
-----
build$ build.bat -DCOMPONENTS=vm.gc_copying
-----

Running VM with the custom GC
-----------------------------

This section describes how to run the DRL virtual machine with the custom
garbage collector library.

You can specify the name of the dynamic library on the command line. For
example, to load a GC gc_copying.dll, execute the following:

-----
ij -Dvm.dlls=gc_copying Hello
-----

The virtual machine searches for a dynamic library gc_copying.dll in the
default locations, that is, the value for the PATH variable and the location of
executable ij.exe. 
The default garbage collector is gc.dll located in the same bin/ directory as 
ij.exe.


DISCLAIMER AND LEGAL INFORMATION
--------------------------------

[[LegalText,*)]] * Other brands and names are the property of their respective 
owners.
---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to