On Mon, Sep 05, 2011 at 01:23:14PM +0300, Daniel Shahaf wrote: > Stefan Sperling wrote on Mon, Sep 05, 2011 at 11:38:11 +0200: > > So you're saying that we should run the plaintext proposed above > > through svndiff? Can you explain in more detail how this would work? > > What is the base of a delta? > > > > The file contains one or more DELTA\n..ENDREP\n streams: > > DELTA > <svndiff stream> > ENDREP > DELTA > <svndiff stream> > ENDREP > > (In second thought, we should be storing the length of the stream > somewhere; on the DELTA header seems a fine place: > > DELTA 512 > <512 bytes of svndiff stream> > ENDREP > DELTA 37 > <37 bytes of svndiff stream> > ENDREP > > .) When the file is read, readers decode all the deltas and concatenate > the resulting plaintexts. When the file is rewritten, writers > optionally combine the first N deltas into a single delta that produces > the combined plaintext. > > The deltas can be self-compressed (like a DELTA\n rep in the revision > files), ie, having no base.
OK, I see. You're trying to save disk space, trading it for CPU time during read/write operations. Does that make sense? Is the amount of data really going to be big enough to be worth it? > > What is 'lhs'? > lhs = left-hand side > rhs = right-hand side > How about calling them after ths RHS'es of the mappings rather than > after the fact that they are mappings? > > > Currently: > > - noderev map file, revision map file, successors data file > > Perhaps: > > - noderev posterity file, successor offsets file, successors data file These names are fine with me. What would you call them on disk? > (Is 'progeny' the more appropriate word here? I like 'progeny' because it means 'immediate offspring'. 'Posterity' includes all ancestors in all generations, and that's not what the file is storing. > > I am happy to just leave this debris in the files for now. > > > > I would guess that nobody will ever even notice this problem in practice. > > The number of commits failing within the time window where succesor data > > is updated will statistically be very low to begin with. > > Each time it happens we lose a very small fraction of disk space. We also > > suffer a teeny tiny bit of read performance loss for readers of successors > > of the affected node-revision. So what... > > > > If it ever becomes a real problem, people can dump/load. > > > > It'll work, but it's a kill a fly with a fleet approach :). Dump/load > the entire history (many MB of svndiffs) only to fix some derived > noderev->offset map data? Sure, it's not optimal. I doubt anyone will be bothered enough to perform a dump/load just for this.