Re: [Felix-language] [felix] serialisation

john skaller Sun, 03 Feb 2013 19:01:16 -0800

I have now got Felix compiler to generated encoders
for data types (without changing the RTTI system).


Consider this code, note that

        p *. m

in Felix is shorthand for

        (*p).m

which is the same meaning as

        p->m

in C. We cannot use the C notation because the precedence is
wrong, being dictated by the precedence required for function types.

So:

/////////////
struct X {
  a:int;
  b:int;
};

var b = new X(1,2);
println$ b*.a;

struct Y {
  c:X;
  d:string;
}
var c = new Y ( X(1,2), "hello");
println$ c*.d;
//////////////

We get these:

////////////////////////////////////
// TESTING ENCODER for type X
 ::std::string _s41399t_57573_encoder(void *d) {
   char *p = (char*)d;
   ::std::string b = "";
   b+=::flx::gc::generic::blit(p,sizeof(_s41399t_57573)); // pod
   return b;
 }

// TESTING ENCODER for type Y
 ::std::string _s41401t_57575_encoder(void *d) {
   char *p = (char*)d;
   ::std::string b = "";
   //Struct
   
b+=::flx::gc::generic::blit(p+offsetof(_s41401t_57575,c),sizeof(_s41399t_57573));
 // pod
   
b+=::flx::gc::generic::string_blit(::flx::gc::generic::string_encoder(p+offsetof(_s41401t_57575,d)));
 //prim
   return b;
 }


// TESTING ENCODER for type string
 ::std::string _a13047t_57553_encoder(void *d) {
   char *p = (char*)d;
   ::std::string b = "";
   b+=::flx::gc::generic::string_blit(::flx::gc::generic::string_encoder(p)); 
//prim
   return b;
 }
/////////////////////////////////

Note these encoders are simply generated
and compiled but not used yet.

First, for a primitive type T which is a "pod" the encoding is done by "blit",
which just returns a string with the binary image of the type.

For a non-pod primitive a user defined encoder is called. This is a function 
named
with the syntax

        type mytype = "mytype"
        requires encoder "myencoder";


and has the type

        string myencoder (void *p);

Given a pointer to an object of the type the user function has to
convert it to a string of any length. The system provides an encoder
for the type string, which just returns the string.

For every primitive with an encoder the compiler wraps the string
by prefixing the string with its length (in binary). For pod this
isn't necessary because the length is known  as sizeof(T). 
[This may change when we have run time defined types]

OK so now for non-primitives. I have changed the definition
of "pod" so that a "pod" is any data type not-containing a non-pod
primitive. So all pointers are now pod. Structs are pod unless
a member is not pod. Same for tuples and records. Unions
are always pod because they're pointers or plain ints.
Note the previously mentioned issue with cstructs.
[I forget what I implemented but I think they're non-pod but
I expect this will cause problems]

The encoding stuff I have shown ignores pointers.
So you can use it to encode anything, but any pointers just get
blitted out in binary.

So we have a first stage encoder for many types now.
Not functions yet! Just types!

Now, for this to be useful we need a routine that does two things.

(1) Encode an object
(2) Find all the pointers, and make sure what they point at is encoded too.


Then we cat the results together and  that's the encoding.
We can find the pointers from the shape offset table.

The algorithm will basically have two sets: already encoded
and not yet encoded. When we grab a pointer we first convert it
to a head pointer (start of heap object). Then we add that
to the not yet encoded set (unless it already in one of the two
sets). If the pointer isn't a Felix pointer we have to just leave it.
Perhaps abort. (but not if NULL, that's OK).

So we form a closure of encodings of all the linked objects
this way and just concatenate them, with their lengths
AND the original address.

To decode, we split up the stream into substrings (we have
the lengths). We make the objects, recording the
old pointer and the new pointer in a list.  For a pod we just
allocate store and blit the data in. For a non-pod primitive
we have to use the user supplied decoder. For a composite
non-pod we just do it memberwise, undoing the encoding.

When we're finished, we have  list of new object addresses,
and we have a map old pointer -> new pointer. So we we run
through the objects in the list, and using the offset tables
we grab each old pointer, look it up in the map, and put
the new pointer in.

This mechanism is VERY nice because user encoder/decoders
just don't have to worry about Felix pointers, only foreign
pointers (for example in a string, the pointer to the array).

Really nasty objects like Google RE2 objects are easy
if we're slightly hacky: we just grab the string regexp
and use that as the encoding. The decoder can rebuild
the RE2 object from just the regexp.

The point here is that the top level serialisation routines
can be written in Felix. The compiler only needs to maintain
the low level per-object ignore Felix pointer serialisation.


--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Re: [Felix-language] [felix] serialisation

Reply via email to