There's an open problem here wrt to garbage collection.

Felix gc system is basically designed as a 'semi-automatic'
memory recovery system that's a bit smarter than C++ destructors
and manual deletion, in that it allows one to cleanup memory
without too much work keeping track of how to do so.

It is characterised by:

(a) world stop for multi-processor
(b) high memory overhead per block (currently 6 machine words)
(c) can only be called inside the driver, that is, not in any
    functional code, because functions use the machine stack, and,
    the machine stack is not accessible to the collector
(d) naive mark/sweep style exact algorithm
(e) more or less ISO C++ compliant and this machine independent

For managing Felix fibres the collector is probably very good:
it works well with the large stack frames which tend to be
produced by inlining. Fibres are stackless so (c) isn't a problem.
In addition, one can easily use, for example, a wrapped C++ STL
vector of integers without needing any GC support at all -- so
the demands on the GC in programs can be as low as zero.
Even when garbage is spewed, for many short scripts,
memory is not exhausted and there's no need to worry about 
collection: almost all the regression and tutorial tests fall
into this category.

But for small objects in functional code, particular variants
with small arguments (eg integer), or lists of small objects,
it isn't likely to deliver good performance .. and the fact
it can't be called in heavily functional code means a significant
garbage tip is required to hold the garbage until it can be
collected.

So, I'm looking for some solutions. The Boehm collector is
somewhat expensive, but current versions support both
incremental collection and extremely low memory overhead.
Unfortunately it doesn't work so well with C++, is
plagued with portability problems, and depends on various
invariants which prevent, for example digital trees being
used.

A dedicated high performance concurrent or even parallel collector
may be possible, but it isn't clear Felix can have enough control
to allow it to operate without throwing out C/C++ object model
compatibility constraints.

Instead, my thoughts are a type based hybrid system.
Currently every object has a header containing an RTTI pointer,
two links securing it in a double linked list of all allocated
blocks, an element count (even if it isn't an array), and some
flags.

However why not change the scheme for small objects somehow?

The idea is basically that all blocks are aligned on 2^N word
boundary, where N=1 on almost all machines, and malloc typically
delivers N=4 (16 byte boundary).

So we could have 16 schemas encoded in pointers at the cost
of masking these bits to zero before dereferencing them.
So we might consider for example a dedicated 'small object'
system where all the objects of a given type were stored
in an array with a bitmap usage technique, giving only a
one bit per object overhead.

Note that those N bits are extremely valuable: they can
also be used to encode variants with small number of 
cases -- which is most of them -- halving storage
overhead for arguments. Note also: it is tricky to do
all of this because the code generated is NOT allowed
to depend on such low level details: the generated C++
has to be portable. Therefore those details have to
represented by C macros, inline functions, and other
C++ level techniques -- the system library doesn't have
to be machine independent (indeed it exists to allow
the generated C++ and more generally Felix code to be so).

Another idea is to ALLOW gc during functional code by
using a conservative technique to analyse the stack.
At least in a single threaded application, each stack
word could be checked to see if it pointed at a heap
block: the set of such blocks is definite and known.

This requires stack hackery finding the stack, which is
quite problematic, particularly in the presence of threads ..
but it can probably be done on most processors. Unlike
the Boehm collector, we only need to examine the stack
from the driver entry to Felix code up to the stack top,
we don't actually need the 'real' base of the stack.
There are of course some caveats .. eg AMD64 on Linux
has a 'red zone' which can be used despite being
past the stack pointer.

Clearly the doubly linked list thing would be not good here,
we'd need some kind of digital encoding (trie) such as a
Judy array to be able to determine if a machine word
could be a heap pointer fast enough to make this approach
practical.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Reply via email to