I am going to explain how the Felix memory system works, so I can show some progress on serialisation. I note that in building the code I'm going to show I have found some bugs in the gc, mainly due to confusing names. So here goes.
In Felix, there is an RTTI object called a gc_shape_t (shape) which tells where pointers are in heap objects. It uses an array of offsets into the object to do that. It also provides some other info, such as the C name of the type. The shape of an object is calculated from its address by looking up in an auxiliary table. Every heap allocation not only creates the object, it also adds and entry into this table mapping the address to the shape. Unlike C++ where an object must be "polymorphic" so it contains a vtable pointer, in Felix the run time type information is stored separately, and so you can have it for any type, non-invasively. That's the small white lie. Here's the reality: (1) If the object is a static array, the offsets are only given for a single element. The shape tells how many elements there are. this allows you to allocate an object with 10,000 elements, without having to replicate the offset table 10,000 times. It also allows the finaliser to be given for a single element, and called 10,000 times at each array location. (2) If the object is a varray, then the whole of the above in (1) is replicated certain number of times. A varray is just a pointer. Two counters are associated with it: (a) The maximum number of slots in the array (b) The number of slots actually used This is done with two more auxiliary tables. The finaliser of (1) is applied repeatedly to the used slots. The actual number of objects of the type of the shape is therefore the product of the static and dynamic array lengths. The actual size in bytes is the product of the element size, the static number of elements, and the dynamic maximum number of slots. Here's the next white lie. All the above works just fine with (most) interior pointers. For any pointer, Felix finds the next lower or equal address actually allocated. Then it calculates how many bytes in the object. If the candidate pointer is in that range, its an interior pointer. Otherwise its a foreign pointer or some random junk. The conservative scan of the machine stack works by using this calculation on every word on the machine stack. At worst some random junk happens to be an interior pointer, in which case we may accidentally consider some heap object to be reachable when it isn't. I say "most" above because Felix assumes heap objects are allocated with a certain alignment, so it "scrubs out" the low bits of pointers. This fact is used in the representation of some variant types, where the tag index is stored in those bits (it's also used in the mark phase of the GC, which uses the low bit as the mark). So Felix will think a char * to the second byte of char array is a head pointer. Care is needed for variants taking char * arguments, that the packed representation is not used if the char* is interior. So now I will backtrack: ALL HEAP OBJECTS IN FELIX ARE ARRAYS. Every object is a static array, perhaps with count = 1. Every object is a variable length array of them, even if the use count and bound are also 1. Do note that the dynamic length is not stored at all if it is 1, so the auxiliary tables only get populated for "real" arrays. The integration of variable length arrays with the garbage collector was an essential historical optimisation. Arrays can get big, so we have to make the GC handle it without massive RTTI objects, and dynamic lengths obviously requires some extra logic. The mechanism is not fully general: only top level objects can be handled like this! So if you have a struct with an int and an array of 10,000 elements which a pointer in them you WILL get an RTTI object with 10,000 offsets, and those offsets will be generated by the compiler as C code. The "fix" for this is to use a more "AST" like representation, where a struct is represented as a list of components of different types. This allows recursion to be used by the GC to find the offsets and apply finalisers, and also allows static arrays in the structs too. Clearly, this is also nice for serialisation. What we're talking about is a type *interpreter* which cans the AST descriptor tree at run time, rather than doing that at compile time. This is useful because it stops a combinatorial explosion. Also it's very hard to compose functions in C: to finalise a struct we have to calculate addresses statically and call finalisers, all in C code we have to generate. As we add more functionality, like copying, and serialisation, more and more functions have to be generated for aggregates. An interpreter is a bit slower. However it opens up the ability to do something interesting: build those ASTs dynamically, at run time. And even without code to do this dynamic meta-programming, it allows combining types from DLLs where we have no idea what the types are. For example if you have two objects from two DLLS of type T1 and T2, you cannot make a struct containing them both (and I mean the actual objects NOT pointers to them) without being able to dynamically define types. Note this is dynamic meta-typing, it is NOT dynamic typing. You may ask, why would we need this/ The answer is simple: how else can you *deserialise* arbitrary data loaded from a file? Anyhow: here's a demo program with some new stuff I'm adding: //////////////// proc pointer_diag (p:address) { var pd = Collector::get_pointer_data p; println$ "Valid=" + str pd.Collector::is_felix_pointer; println$ "Is head=" + pd.Collector::is_head_pointer.str; var shape = pd.shape; println$ "Element type = " + shape.Rtti::cname.string; var bpe = shape.Rtti::bytes_per_element.ulong; println$ "Bytes per element = " + bpe.str; println$ "Static array length = " + shape.Rtti::number_of_elements.str; println$ "Dynamic array length = " + pd.used_elements.str; println$ "Max dynamic array length = " + pd.max_elements.str; var nelts = pd.used_elements * shape.Rtti::number_of_elements.ulong; println$ "Aggregate number of used elements " + nelts.str; println$ "Store to serialise: " + (nelts * bpe) . str; } var p = varray[ldouble] (20.size,3.3l); println$ p; println$ "Lenth=" + p.len.str; pointer_diag (C_hack::cast[address] p); Valid=true Is head=true Element type = _a5356t_57931 Bytes per element = 16 Static array length = 1 Dynamic array length = 20 Max dynamic array length = 20 Aggregate number of used elements 20 Store to serialise: 320 //////////////// With one more test, namely, that there are no pointers in there, we can now use memcpy to serialise the data. This will fail for non-scalar types like string, but we can do a hack for that: check for a non-trivial finaliser. Felix NULLs out the finaliser if you tell it the type is a "pod". Normally the finaliser is just the C++ destructor. If there is one it is usually either freeing up internal memory allocated, or releasing a resource such as a file handle. So we can safely write a serialisation function now that handles simple cases, in Felix (just use memcpy :) To serialise non-POD's we need to get the compiler to generate a serialisation function. For abstract types, the programmer has to write it and tell the compiler. For aggregates the compiler can just generate a new function by calling the serialisation functions of the components (also handling static arrays with loops). Deserialise will be similar. This can be extended fairly easily to handle pointers: a couple of days work there, but nothing hard. Note that none of this will be dynamic: the dynamic RTTI stuff will have to be done, I have some prototype code, but I haven't quite figured out variants yet. -- john skaller skal...@users.sourceforge.net http://felix-lang.org ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_jan _______________________________________________ Felix-language mailing list Felix-language@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/felix-language