On Mon, Sep 05, 2011 at 01:23:14PM +0300, Daniel Shahaf wrote:
> Stefan Sperling wrote on Mon, Sep 05, 2011 at 11:38:11 +0200:
> > So you're saying that we should run the plaintext proposed above
> > through svndiff? Can you explain in more detail how this would work?
> > What is the base of a delta?
> > 
> 
> The file contains one or more DELTA\n..ENDREP\n streams:
> 
>   DELTA
>   <svndiff stream>
>   ENDREP
>   DELTA
>   <svndiff stream>
>   ENDREP
> 
> (In second thought, we should be storing the length of the stream
> somewhere; on the DELTA header seems a fine place:
> 
>   DELTA 512
>   <512 bytes of svndiff stream>
>   ENDREP
>   DELTA 37
>   <37 bytes of svndiff stream>
>   ENDREP
> 
> .)  When the file is read, readers decode all the deltas and concatenate
> the resulting plaintexts.  When the file is rewritten, writers
> optionally combine the first N deltas into a single delta that produces
> the combined plaintext.
> 
> The deltas can be self-compressed (like a DELTA\n rep in the revision
> files), ie, having no base.

OK, I see. You're trying to save disk space, trading it for CPU time
during read/write operations. Does that make sense? Is the amount of
data really going to be big enough to be worth it?

> > What is 'lhs'?
> lhs = left-hand side
> rhs = right-hand side

> How about calling them after ths RHS'es of the mappings rather than
> after the fact that they are mappings?
> 
> 
> Currently:
> 
> - noderev map file, revision map file, successors data file
> 
> Perhaps:
> 
> - noderev posterity file, successor offsets file, successors data file

These names are fine with me.

What would you call them on disk?

> (Is 'progeny' the more appropriate word here?

I like 'progeny' because it means 'immediate offspring'.
'Posterity' includes all ancestors in all generations, and that's not
what the file is storing.

> > I am happy to just leave this debris in the files for now.
> > 
> > I would guess that nobody will ever even notice this problem in practice.
> > The number of commits failing within the time window where succesor data
> > is updated will statistically be very low to begin with.
> > Each time it happens we lose a very small fraction of disk space. We also
> > suffer a teeny tiny bit of read performance loss for readers of successors
> > of the affected node-revision. So what...
> > 
> > If it ever becomes a real problem, people can dump/load.
> > 
> 
> It'll work, but it's a kill a fly with a fleet approach :).  Dump/load
> the entire history (many MB of svndiffs) only to fix some derived
> noderev->offset map data?

Sure, it's not optimal. I doubt anyone will be bothered enough to
perform a dump/load just for this.

Reply via email to