[Felix-language] serialisation

john skaller Tue, 29 Jan 2013 19:40:46 -0800

I am going to explain how the Felix memory system works, so I can
show some progress on serialisation. I note that in building the
code I'm going to show I have found some bugs in the gc,
mainly due to confusing names.  So here goes.


In Felix, there is an RTTI object called a gc_shape_t (shape) which
tells where pointers are in heap objects. It uses an array of offsets
into the object to do that. It also provides some other info,
such as the C name of the type.

The shape of an object is calculated  from its address by looking
up in an auxiliary table. Every heap allocation not only creates 
the object, it also adds and entry into this table mapping the
address to the shape. Unlike C++ where an object must be
"polymorphic" so it contains a vtable pointer, in Felix the
run time type information is stored separately, and so you can
have it for any type, non-invasively.

That's the small white lie. Here's the reality:

(1) If the object is a static array, the offsets are only given
for a single element. The shape tells how many elements there are.
this allows you to allocate an object with 10,000 elements, without
having to replicate the offset table 10,000 times. It also allows
the finaliser to be given for a single element, and called 10,000 times
at each array location.


(2) If the object is a varray, then the whole of the above in (1)
is replicated certain number of times. A varray is just a pointer.
Two counters are associated with it:

        (a) The maximum number of slots in the array
        (b) The number of slots actually used

This is done with two more auxiliary tables.
The finaliser of (1) is applied repeatedly to the used slots.
The actual number of objects of the type of the shape is therefore
the product of the static and dynamic array lengths.

The actual size in bytes is the product of the element size,
the static number of elements, and the dynamic maximum
number of slots.

Here's the next white lie. All the above works just fine with (most)
interior pointers. For any pointer, Felix finds the next lower or
equal address actually allocated. Then it calculates how many
bytes in the object. If the candidate pointer is in that range,
its an interior pointer. Otherwise its a foreign pointer or some
random junk. The conservative scan of the machine stack
works by using this calculation on every word on the machine
stack. At worst some random junk happens to be an interior
pointer, in which case we may accidentally consider some
heap object to be reachable when it isn't.

I say "most" above because Felix assumes heap objects are
allocated with a certain alignment, so it "scrubs out" the
low bits of pointers. This fact is used in the representation
of some variant types, where the tag index is stored in
those bits (it's also used in the mark phase of the GC,
which uses the low bit as the mark).

So Felix will think a char * to the second byte of char
array is a head pointer. Care is needed for variants
taking char * arguments, that the packed representation
is not used if the char* is interior.

So now I will backtrack:

        ALL HEAP OBJECTS IN FELIX ARE ARRAYS.

Every object is a static array, perhaps with count = 1.
Every object is a variable length array of them, even
if the use count and bound are also 1.

Do note that the dynamic length is not stored at all if it
is 1, so the auxiliary tables only get populated for
"real" arrays.

The integration of variable length arrays with the garbage collector
was an essential historical optimisation. Arrays can get big,
so we have to make the GC handle it without massive RTTI
objects, and dynamic lengths obviously requires some
extra logic. The mechanism is not fully general: only top 
level objects can be handled like this! So if you have a struct
with an int and an array of 10,000 elements which a pointer
in them you WILL get an RTTI object with 10,000 offsets,
and those offsets will be generated by the compiler as C code.

The "fix" for this is to use a more "AST" like representation,
where a struct is represented as a list of components
of different types. This allows recursion to be used by the GC
to find the offsets and apply finalisers, and also allows
static arrays in the structs too.

Clearly, this is also nice for serialisation. What we're talking
about is a type *interpreter* which cans the AST descriptor
tree at run time, rather than doing that at compile time.
This is useful because it stops a combinatorial explosion.
Also it's very hard to compose functions in C: to finalise a struct
we have to calculate addresses statically and call finalisers,
all in C code we have to generate. As we add more functionality,
like copying, and serialisation, more and more functions have
to be generated for aggregates.

An interpreter is a bit slower. However it opens up the ability
to do something interesting: build those ASTs dynamically,
at run time. And even without code to do this dynamic meta-programming,
it allows combining types from DLLs where we have no idea
what the types are. For example if you have two objects from
two DLLS of type T1 and T2, you cannot make a struct containing
them both (and I mean the actual objects NOT pointers to them)
without being able to dynamically define types.

Note this is dynamic meta-typing, it is NOT dynamic typing.

You may ask, why would we need this/

The answer is simple: how else can you *deserialise* arbitrary
data loaded from a file?

Anyhow: here's a demo program with some new stuff I'm adding:

////////////////
proc pointer_diag (p:address) 
{
  var pd = Collector::get_pointer_data p;
  println$ "Valid=" + str pd.Collector::is_felix_pointer;
  println$ "Is head=" + pd.Collector::is_head_pointer.str;
  var shape = pd.shape;
  println$ "Element type =  " + shape.Rtti::cname.string;
  var bpe = shape.Rtti::bytes_per_element.ulong;
  println$ "Bytes per element = " + bpe.str;
  println$ "Static array length = " + shape.Rtti::number_of_elements.str;
  println$ "Dynamic array length = " + pd.used_elements.str; 
  println$ "Max dynamic array length = " + pd.max_elements.str; 
  var nelts = pd.used_elements * shape.Rtti::number_of_elements.ulong;
  println$ "Aggregate number of used elements " + nelts.str;
  println$ "Store to serialise: " + (nelts * bpe) . str;
}

var p = varray[ldouble] (20.size,3.3l);
println$ p;
println$ "Lenth=" + p.len.str;
pointer_diag (C_hack::cast[address] p);

Valid=true
Is head=true
Element type =  _a5356t_57931
Bytes per element = 16
Static array length = 1
Dynamic array length = 20
Max dynamic array length = 20
Aggregate number of used elements 20
Store to serialise: 320
////////////////

With one more test, namely, that there are no pointers in there,
we can now use memcpy to serialise the data. This will fail
for non-scalar types like string, but we can do a hack for that:
check for a non-trivial finaliser. Felix NULLs out the finaliser
if you tell it the type is a "pod". Normally the finaliser is just
the C++ destructor. If there is one  it is usually either freeing
up internal memory allocated, or releasing a resource
such as a file handle.

So we can safely write a serialisation function now that handles
simple cases, in Felix (just use memcpy :)

To serialise non-POD's we need to get the compiler to generate
a serialisation function. For abstract types, the programmer has
to write it and tell the compiler. For aggregates the compiler can 
just generate a new function by calling the serialisation functions
of the components (also handling static arrays with loops).

Deserialise will be similar.

This can be extended fairly easily to handle pointers: a couple
of days work there, but nothing hard.

Note that none of this will be dynamic: the dynamic RTTI stuff 
will have to be done, I have some prototype code, but I haven't
quite figured out variants yet.

--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

[Felix-language] serialisation

Reply via email to