Re: [Etoile-dev] EtoileSerialize overhaul?

Quentin Mathé Wed, 27 Oct 2010 06:18:16 -0700

Hi Niels,

Le 27 oct. 2010 à 11:50, Niels Grewe a écrit :


> Hello guys,
> 
> I've recently started working on EtoileSerialize again, the objective
> being to make it support serialising objects that implement NSCoding.

I have been following your commits (loosely). Supporting NSCoding is really 
important I think, I was unable to figure out how to do it. I'm glad you did :-)

> The code presently lives in my branch (thebeing/ESArchiving) and the
> archiving side of things is already looking good, and I'm going to write
> the unarchiving part over the next week. Unfortunately, this change is
> not entirely backwards-compatible. You can still deserialize object
> graphs written by old versions of ES, but not the other way around.

This doesn't matter, nobody really uses ES or CO right now. I plan to break 
CoreObject backward-compatibility soon and overhaul it. Among other things, 
I'll add a simple mechanism to support upgrading the metadata database schema. 
I think we shouldn't worry about that until 0.5 and just focus on improving 
both ES and CO as much as we can until we get the first user-oriented release 
out.
Compound document automatic persistence will also require surely a substantial 
number of adjustments in CO and may be ES too.

> Also, the object stores are as of yet unable to encode information about
> what version of ES the archives were created with (the binary
> backend does not even contain any magic number). 

Right, we need to encode the format (type and version) at the start of the 
archive.

> That is why I'd like to propose breaking backwards-compatibility
> altogether and make additional changes in the following areas:
> 
> * Smarter stores: Object stores should be able to carry metadata about
>  the ES/backend version they were created with, which backends they can
>  be used with (and which is the preferred one).

I have started reworking the object store API to regroup delta and full saves 
into single unit/store rather than using two object stores.
I need to finish this code and commit it. Should I finish this asap or is this 
not too critical?
We can discuss thereworked  API and the needs you envision in more details if 
needed…
I have some other ES API and code cleaning changes still to be committed. Let 
me know if you prefer I commit that asap or later (after you merge your recent 
work into trunk).

I also plan to add an Info.plist (or equivalent) to store some ES/CO internal 
informations/metadatas in each object store. Would also enable to recover the 
core object graph in case the metadata db gets corrupted.

On this topic, I had a long discussion with Nicolas one year ago about using 
the metadata db just as a cache to make CoreObject as resilient as possible. By 
scanning a volume and storing the right informations in each object store, we 
could support to reconstruct the core object graph and the metadata db from 
scratch. 

David was advocating another approach, storing less things in each object store 
and relying more on the metadata db. For instance, to store simple property 
changes in the metadata db and not in the object store, the object store would 
be reserved for non-trivial delta changes (e.g. range editing) and snapshots. 
I'm not convinced by this approach because it creates two persistence code 
paths, and means that you cannot simply move an object store around without 
extracting the trivial changes stored in the metadata db. This also implies 
that recovering from a metadata db corruption is not possible. If we go this 
way, it might be better to put everything in the metadata db. 
However I must admit, that we'll need to tag methods triggering persistency in 
some way to immediately index the changed property values in the metadata db 
(supposing we want to support temporal versioning). And this problem is kinda 
orthogonal to David's suggestion. If we don't do that we need to have a 
standalone process that deserializes recently touched core objects and indexes 
them, this is much more complex and costly.

I quite like Nicolas' approach, although it's going to cost some space 
(replicating property changes in the metadata db and the object store).

> * Overhaul the binary backend: Presently it stores data in a
>  machine-dependent format, but it really should encode stuff in an
>  independent format.

Would be better.

> * Reconsider the relationship with CoreObject: I'm curious to see
>  whether we can somehow integrate the automatic persistency from ES
>  with Eric's new more informed approach to versioning/merging. (e.g.
>  how does it relate to branches/versions in ES object stores?)

Afaik, there is no branch concept in Eric's CoreObject and each version is 
uniquely identified by a SHA. Uniquely identified version makes much easier to 
handle merging. You can freely move around bits of history. For example, 
suppose you copy some history to another core object, makes some more changes 
and then wants to reintegrate it back into the first core object. You could 
also compute which parts of the history can be deleted more easily (the job of 
the garbage collector to be written).

We discussed the matter a bit with Eric last string and I told him I liked the 
ideas and suggested we could keep the same ES API (or not), but decide that:
- a branch is just a convention at the implementation level (a core object 
directory inside another core object directory in the case of an object bundle, 
and a reference to the "parent" version and core object)
- identify each core object version by a UUID on disk (and cache a version 
number in the metadata db or put both in the archive filename)

My opinion is that branches should probably not exist at the implementation 
level but are a convenient construct at the API and user level.

> Since we probably don't want to break the format ES uses very often, I'd
> be really interested to hear your opinions on the matter (questions?
> further suggestions? issues?)

As I said, I think it doesn't matter to break ES format for now, although we 
should think about a backward compatibility mechanism. Just put a big warning 
in each commit that breaks the format and tells people they need to delete 
their CoreObject library.

Cheers,
Quentin.



_______________________________________________
Etoile-dev mailing list
Etoile-dev@gna.org
https://mail.gna.org/listinfo/etoile-dev

Re: [Etoile-dev] EtoileSerialize overhaul?

Reply via email to