Re: [boost] Serialization Review Results

Dave Harris Thu, 12 Dec 2002 16:25:52 -0800

In-Reply-To: <[EMAIL PROTECTED]>
On Wed, 11 Dec 2002 18:17:21 -0500 David Abrahams 
([EMAIL PROTECTED]) wrote:
> I'm willing to use any terms that everyone will agree to
> (including yours)


Me too.


> but whichever terms we use should be at least as clearly defined
> as what Augustus wrote. 

I'm afraid I couldn't quite get my head around them. To me, "persistence" 
and "serialisation" are at different levels of abstraction. Serialisation 
is one way to implement persistence. As such they do not compete; they are 
not mutually incompatible alternatives.

I think we have a consensus that a fully general persistence library, that 
could be implemented by dumping RAM images to disk or whatever, is not 
what we want at this point. I'm OK with that. What I don't understand is 
what Augustus means when he says:

     I think that plain serialization (your term) should be
     explicitly *not supported* and defer that use case to a
     safer, more airtight approach with a persistance library.

What is gained by excluding persistence, and/or the simpler kinds of 
serialisation (where source and destination are the same program running 
on the same hardware with the same compiler)? 


> So far, you haven't provided a clear definition of serialization.

Actually I agree with Augustus's, as far as I understand it, which isn't 
far. He seems to imply that serialisation does not need to bother with 
object factories or object lifetime management. I don't understand how 
that can be. I can't figure out whether UTD versioning belongs to 
Persistence or to Serialisation. He says Persistence, but doesn't that 
make Persistence asymmetrical and involve it in non-trivial transforms? 
How can it be achieved by transparent meta-programming magic? I doubt a 
robust but transparent persistence mechanism can be built.


> > We could send a binary format through a "uuencode" filter, but a
> > text format which was natively safe would be neater (and probably
> > more efficient). 
> 
> Why would it be more efficient?

Because it has more knowledge.

For example, if we write out the number 500 using an alphabet of 64 safe 
characters, it takes 2 characters. If we write it out using all 256 
characters, it still takes 2 of them, but now to make it safe each 
character needs 2 safe characters to represent it, so it takes 4 bytes 
altogether. The double conversion is more verbose because the first part 
loses information.


> > Adding or removing instance variables is pretty straightforward. 
> 
> Erm.  I am still leery of thinking of all this in terms of "instance
> variables".  The representation of state written to the archive may or
> may not have a direct correspondence to a class' data members.

Sure. Call them "fields" if it helps. I sometimes find it helpful to think 
in terms of concrete examples.

The point is, sometimes a class grows so that its serialised 
representation gets bigger.


> "schema ID"?

A term from MFC. It is what the submitted library calls a file_version.


> Can you give an example of "containing the mess within the UDT?"

I don't have a good example to hand. Here's a made up one:

    void MyClass::load( CArchive &ar ) {
        int schema = load_schema( ar, 10, 15 );
        
        if (schema >= 13)
            MyBaseClass::load( ar );
        else {
            MyOldBaseClass::load( ar );
            int myBaseClassData;
            ar >> myBaseClassData;
            MyBaseClass::init( myBaseClassData );
        }
        
        if (schema >= 14)
            ar >> myVar1;
        else
            myVar1 = 100;
            
        if (schema == 14) {
            int unused;
            ar >> unused;
        }
            
        if (schema >= 13)
            ar >> myVar2;
        else {
            MyOldType t;
            ar >> t;
            myVar2 = convert( t );
        }
   }

The first line fetches the class's schema/version number. The arguments to 
load_schema() are used for range-checking - load_schema() may throw. For 
safety, it's best not to use schemas 0 or 1 so I usually start from 10.

The next block chains explicitly to the base class. In this case older 
archives used a different base class so we have some nasty code to make it 
work.

The next few lines load a variable. Old archives didn't store it, so we 
have to provide a default value.

Schema 14 added an int which was later removed; if it is present we have 
to skip over it.

The last few lines load another variable. Older archives used a different 
type so we may need to load a temporary object of that old type and then 
convert it.

I don't know what you think of this code - whether it horrifies you for 
being too low level or lacking in design foresight. It is my practical 
experience. Designs age, and the history accretes in the serialisation 
load routines. I hope that the boost library will be able to support this 
kind of evolution. I don't claim that code like this is the best solution, 
but in practice I have found it works.


> It's beginning to sound more and more like the metaclass framework
> some people have been hinting at.

Do you mean that some framework could handle a history like that reflected 
in the above code, automatically? How would that work? How it could track 
changes to the base class over time?

Java manages it by storing a snapshot of the class hierarchy (as it was 
when the archive was made) into the archive. That gives it enough 
information to figure out how the hierarchy has changed. However, it can 
lead to rather bloated archives.


> > Renaming classes is something which MFC doesn't support. I believe
> > that some of the proposals which came up during the review would
> > allow this. 
> 
> Why should a class name come into play, unless you were using
> std::type_info to archive it?

MFC uses its own macro-driven RTTI system, in which classes are identified 
by name. If we don't use that, or type_info, then there is probably no 
problem. We do need to make sure that we can add new classes without 
somehow breaking the correspondence between the old classes and whatever 
the archive stores to represent them.


> ... assuming there is such a factory method.

The archive has to store something to represent classes, and has to be 
able to create instances of the classes so represented, in order to 
restore polymorphic pointers. That's what I mean by a "factory method". I 
don't mean to imply a particular implementation.


> It sounds like your viewpoint on this is very heavily influenced by
> one particular kind of application.

Yes. Well, less so then my choice of words may have implied. And of course 
in that passage I was discussing a trap that MFC fell into. Your earlier 
comment:

     [...] the use of type_info::name() for type identification.  Even
     if these were optional components to the library, they could
     provide  enormous benefit for some applications.

made it sound like you might make the same mistake. If we use class names 
to identify types, we need to make sure we can rename classes and still 
load old files.

But generally, yes, I know what kind of applications I write and I hope 
boost will support. If other people have different expectations, shouldn't 
they write about them? Isn't that what this pre-coding discussion is for?

I'm sorry for the length of this post, but now that I've written it, maybe 
you can tell me whether I want a persistency library or a serialisation 
library.

-- Dave Harris

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] Serialization Review Results

Reply via email to