> > > > In the end, I'm pretty sure that Ben is right, and that relocation is only > necessary in corner cases. For those "stuck" objects, I'd really like to > know what the in-degree is, and I'd like some sense of how read-hot the > containing blocks are once the other 90% of the objects are gone. That > would tell us if a "mostly not copying" design is feasible. At the moment I > don't feel like I have any basis for intuitions on that. >
I dont disagree with your comments . A few comments . People looking at Boehm / Mark Sweep and uncooperative environments should definetly look at non moving RC-Immix style collector. The worst case is not a unsustained growth because new blocks are not needed you can reallocate holes at decreasing mutator efficiency until all space is used. We do have to be concerned about the fragmentation cost. For fragmentation the best case is every object is a multiple of the line size in which case there is zero fragmentation , the worst case is where you have lots of minimum sized objects ( say 32 bytes ) and exactly 1 survives for a 75 % fragmentation cost. Also its tunable a smaller line size has lower fragmention at a slight cost to mutator performance. I think the cost is about 10-15% which could even drop by half for say a 64 byte line size if memory is crucial eg on a mobile device run a SpaceEfficient allocator . Either way its not huge as many allocators have some fragmentation cost and in some cases 4 word headers. One thing to note is nursery fragmentation is a bigger issue due to the small size of most short lived objects. Potentially we could even fill fragmented lines but the cost of allocating to these small holes is pretty big it would be an allocation list . The much bigger issue is Rc-Immix has poor performance ( compared to cutting edge) for heaps up to 1.5* and it isnt great until about 2 * heap . The reasons is not so much related to copying but I will restate for emphasis. 1) Either copying cost is not as efficient as mark sweep or no copying has poor Gen 1 locality ( the diffirence between these is low) ( for 1-1.5* heap performance ) 2) More importantly because of the greater space availability there is far less work especially ref counting so cycles are more dispersed. Concurrency can help here depending on barrier cost .. but changes the whole nature of the collector . The fact most GC heaps require 1.5+* memory for really good performance is a bigger issue than 10% fragmentation. I believe the best intial solutions is non relocating Rc-Immix instrument it , optomize it which is much easier to do when you dont need to worry about barrier costs / mutator non allocating behaviour . For v2 I think this can be significantly improved by adding a fixed Nursery ( eg N blocks which are always used first and swept) for 1 - 1.5 * heap performance and it reduces a lot of our fragmentation this only requires a cheap write barrier with a fast and slow path ie if a nursery is the first N blocks all we need is a conditional on write to see if the address is < x - still a cost but much cheaper than a card table actually since we have a modbuf we could avoid this and process the mod buf when the nursery is swept ) . After this we can consider the high concurrency cases we can either resort to non copying or have individual mark / Sweep nurseries for each thread/core which need to be synced ( as discussed earlier possibly by forcing sweep if a nursery references another nursery object or disallowing it) or more complex barriers which would also allow the main heap to copy. > > Unfortunately, the only way to get that data appears to be building an > implementation we can measure. As nice and as clean as Jikes is, it is > built to support a language with a qualitatively different set of objects > and a significantly different BCVM semantics available to the compiler. And > we're contemplating making further changes to the BCVM semantics. > > Agree best way is to build it with instrumentation first . Unfortunately we have no base line .. thats the nice part of Jikes so may need to write a Mark sweep or ref count as a comparison . Though we could just use malloc as the baseline .. > Do we know of any comparative data on *Java* programs between CLR and JVM > that would tell us whether, for those programs, the behaviors of the > respective BCVM machines are comparable? > There is nothing published im aware of .. I can only guess at things like bigger but many fewer objects. ( which is an issue with the 8K block size / LOB size ) . Ben
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
