Re: [drlvm] The first GC helper with fast-path implemented in Java: gc_alloc

Weldon Washburn Thu, 12 Oct 2006 15:42:48 -0700

All,

This is a good discussion that has surfaced many topics related to writing
inlinable vm helpers in java/vmmagic.  I leave out all the email replies to
reduce clutter.


Ultimately we will need to solve all the problems that have surfaced
including making changes to GC/JIT/VM interfaces.  I suggest that for right
now we focus only on demonstrating the benefit of inlining one specific
existing API, gc_alloc_fast().  The debate on interface mods can happen
later.   How about the following steps?

1)
Confirm that Mikhail's translation into java/vmmagic is accurate.
2)
Get Jitrino.OPT to inline and optimize this code and generate correct binary
image
3)
Show the performance delta for some workloads

More comments inlined below --


On 10/11/06, Mikhail Fursov <[EMAIL PROTECTED]> wrote:


GC, VM gurus!
I need your help in implementation of the first our helper written with
magic.
I've started with GCv41 allocation helper for objects.
Please review the way I'm going to implement it and correct me if I have
misunderstood something or confirm if everything is OK.


The native fast path:

Managed_Object_Handle gc_alloc_fast(unsigned in_size,  Allocation_Handle
ah,
void *thread_pointer) {
C1.    assert((in_size % GC_OBJECT_ALIGNMENT) == 0);
C2.    assert (ah);
C3.    unsigned char *next;

C4.    GC_Thread_Info *info = (GC_Thread_Info *) thread_pointer;
C5.    Partial_Reveal_VTable *vtable = ah_to_vtable(ah);
C6.    GC_VTable_Info *gcvt = vtable->get_gcvt();
C7.    unsigned char *cleaned = info->tls_current_cleaned;
C8.    unsigned char *res = info->tls_current_free;

C9.    if (res + in_size <= cleaned) {
C10.        if (gcvt->is_finalizible()) return 0;

C11.        info->tls_current_free =  res + in_size;
C12.        *(VT32*)res = ah;

C13.        assert(((POINTER_SIZE_INT)res & (GC_OBJECT_ALIGNMENT - 1)) ==
0);
C14.        return res;
C15.    }

C16.    if (gcvt->is_finalizible()) return 0;

C17.    unsigned char *ceiling = info->tls_current_ceiling;


C18.    if (res + in_size <= ceiling) {

C19.        info->tls_current_free = next = info->tls_current_free +
in_size;

       // cleaning required
C20.        unsigned char *cleaned_new = next +
THREAD_LOCAL_CLEANED_AREA_SIZE;
C21.        if (cleaned_new > ceiling) cleaned_new = ceiling;
C22.        info->tls_current_cleaned = cleaned_new;
C23.        memset(cleaned, 0, cleaned_new - cleaned);
C24.        *(VT32*)res = ah;

C25.        assert(((POINTER_SIZE_INT)res & (GC_OBJECT_ALIGNMENT - 1)) ==
0);
C26.        return res;
C27.    }

C28.    return 0;
}



The helper's code:

public static Object gc_alloc(int objSize, int allocationHandle) {

J1.    Address tlsAddr = TLS.getGCThreadLocal();

J2.    Address tlsCurrentFreeFieldAddr = tlsAddr.plus
(TLS_CURRENT_FREE_OFFSET);
J3.    Address tlsCurrentCleanedFieldAddr = tlsAddr.plus
(TLS_CURRENT_CLEANED_OFFSET);

J4.    Address tlsCurrentFreeAddr = tlsCurrentFreeFieldAddr.loadAddress();
J5.    Address tlsCurrentCleanedAddr =
tlsCurrentCleanedFieldAddr.loadAddress();

J6.    Address tlsNewFreeAddr = tlsCurrentFreeAddr.plus(objSize);

// the fast path without cleaning
J7.    if (tlsNewFreeAddr.LE(tlsCurrentCleanedAddr)) {
J8.        tlsCurrentFreeFieldAddr.store(tlsNewFreeAddr);
J9.        tlsCurrentFreeAddr.store(allocationHandle);
J10.        return tlsCurrentFreeAddr;
J11.    }

J12.    Address tlsCurrentCeilingFieldAddr = tlsAddr.plus
(TLS_CURRENT_CEILING_OFFSET);
J13.    Address tlsCurrentCeilingAddr =
tlsCurrentCeilingFieldAddr.loadAddress();

       // the fast path with cleaning
J14.   if (tlsNewCurrentFreeAddr.LE(tlsCurrentCeilingAddr)) {
J15.       Address tlsNewCleanedAddr = tlsCurrentCeilingAddr;
J16.       if (tlsCurrentCeilingAddr.diff(tlsNewFreeAddr) >
THREAD_LOCAL_CLEANED_AREA_SIZE) {
J17.           Address tlsCleanedNew = tlsNewFreeAddr.plus
(THREAD_LOCAL_CLEANED_AREA_SIZE);
J18.       }
J19.       int bytesToClean = tlsNewCleanedAddr.diff(tlsNewFreeAddr);
J20.       org.apache.harmony.vmhelper.native.Utils.memset(tlsNewFreeAddr,
bytesToClean, 0);
J21.       tlsCurrentCleanedFieldAddr.store(tlsNewCleanedAddr);

J22.       tlsCurrentFreeFieldAddr.store(tlsNewFreeAddr);
J23.       tlsCurrentFreeAddr.store(allocationHandle);
J24.       return tlsCurrentFreeAddr;

        }

       //the slow path
       //this call will be replaced by JIT with direct native call as VM
magic
       org.apache.harmony.vmhelper.native.DRLVMHelper.gc_alloc(objSize,
allocationHandle);

}


The problems I see:

1) The problem: GC helper must know GC_Thread_Info struct offsets.



If I understand correctly, you are referring to TLS_CURRENT_FREE_OFFSET and
TLS_CURRENT_CEILING_OFFSET.  Can we leave this as an ugly hack for right
now?  That is, hardcode the actual offsets.  Something like: "static int
TLS_CURRENT_FREE_OFFSET 0x18;"


2) The problem: Where to keep GC magic code? This code is GC specific and

must be available for bootstrap classloader.
JIT can know all the details which magic code to inline (the helper type,
the helper signature) from its properties (see opt.emconf file for
example)



Its prototype code for now.  Its not critical that we identify its final
location at this point.  In any case, it definitely belongs to the GC
developers.

3) The problem: Is the signature for gc_alloc method : gc_alloc(int objSize,

int allocationHandle) is universal for all GCs?



Well, gc_alloc(...) is what the GC/VM interface currently supports.  After
working with MMTk, I now know this API is *not* universal.

I think it's not. But we can extend JIT with different signatures support if
needed.

This is correct.  We need to extend Jitrino.JET with the MMTk allocation
signature.  Then we need to discuss the impact on GC/VM/JIT interfaces.  I
will restart this discussion soon.

4) The new magic method is proposed, line J21:

org.apache.harmony.vmhelper.native.Utils.memset(tlsNewFreeAddr,
bytesToClean, 0);



I agree with the previous comments that #4 is not needed.

5) The magic code in does not contain 'finalizable' check.

JIT can do this check during the compilation and do not generate the fast
path. This is another option to pass to JIT from GC.




#5 is really independent of writing helpers in java/vmmagic.  How about
addressing #5 at a later time?

I've enumerated the lines in code if you want to comment it.

Please feel free to review the code and to discuss any other problems I
missed.

--
Mikhail Fursov



--
Weldon Washburn
Intel Middleware Products Division

Re: [drlvm] The first GC helper with fast-path implemented in Java: gc_alloc

Reply via email to