NODE_DATA discussions

Julian Foad Wed, 18 Aug 2010 13:49:50 -0700

Hyrum, Eric H., Philip M. and I met up at WANdisco's office in Sheffield
(England) yeterday and today.  One of the things we discussed was
NODE_DATA.


We discussed several sub-topics, of which here are two.  I've written
these up including my further thoughts which weren't part of the
discussion, so it's biased.


-----------------------

Observation:

The new NODE_DATA table can completely subsume the old BASE_NODE and
WORKING_NODE tables.

  BASE_NODE    => NODE_DATA(op_depth==0),
  WORKING_NODE => NODE_DATA(op_depth==max).

A few of the columns do not make sense in every op_depth.
translated_size and last_mod_time make sense only on the top-most node.
But putting these columns into this table seems better than keeping them
in a separate table.

  BASE_NODE                 NODE_DATA               WORKING_NODE
  ----------------          --------------          --------------
  Indexing              =>  yes + op_depth    <=    Indexing
  presence              =>  yes               <=    presence
  Node-Rev              =>  yes: original_*   <=    copyfrom_*
  Content               =>  yes               <=    Content
  Last-Change           =>  yes               <=    Last-Change
  translated_size       =>  TODO              <=    translated_size
  last_mod_time         =>  TODO              <=    last_mod_time
  file_external         x   no - obsolete
  dav_cache             =>  TODO
  incomplete_children   x   no - obsolete
                            no - not ready?   <=    moved_here
                            no - not ready?   <=    moved_to
                            no - obsolete      x    keep_local

By "TODO" I mean not yet in wc-metadata.sql.

By "not ready?" I mean we're not ready to fully define and use the
'moved_*' columns so it would be better to insert them with a WC format
upgrade when we are ready.


-----------------------

[RFC] Instead of recording the "deleted" state as a presence value in
the topmost operative layer, consider whether to record it by means of a
flag in the next-nearest layer *beneath*.

This initially sounded appealing, but I'm not so sure now.  It could
just be a premature optimisation side-track.  It's certainly a way down
the list of Important Things.

Example sequence of operations, showing representation in both schemes:

  Operation: clean checkout

  WC paths            op_depth=0  op_depth=1        op_depth=0  op_depth=1
  ------------        ----------  ----------        ----------  ----------
  A1/                 norm                          norm
   +- f.old           norm                          norm
   +- f               norm                          norm
  B1/                 norm                          norm
   +- f               norm                          norm
   +- f.new           norm                          norm

  Operation: delete ./A1

  WC paths            op_depth=0  op_depth=1        op_depth=0  op_depth=1
  ------------        ----------  ----------        ----------  ----------
  A1/                 norm  Del!                    norm        base_deleted
   +- f.old           norm  Del!                    norm        base_deleted
   +- f               norm  Del!                    norm        base_deleted
  B1/                 norm                          norm
   +- f               norm                          norm
   +- f.new           norm                          norm

  Operation: copy ^/B1 to ./A1, replacing ^/A1

  WC paths            op_depth=0  op_depth=1        op_depth=0  op_depth=1
  ------------        ----------  ----------        ----------  ----------
  A1/                 norm  Del!  norm              norm        norm
   +- f.old           norm  Del!                    norm        base_deleted
   +- f               norm  Del!  norm              norm        norm
   +- f.new                       norm                          norm
  B1/                 norm                          norm
   +- f               norm                          norm
   +- f.new           norm                          norm


Flag in layer beneath doesn't require a final base_deleted row in the
topmost layer.  The flag is required for (and only for) paths where a
row exists in the previous layer, so it seems more space-efficient to
store it there.

Flag in layer beneath allows simpler reverting of the "add" half of a
replacement: (remove all op_depth=N rows that are children of this op)
rather than (convert all these rows to a different presence value that
depends on their presence in layer N-1).  That's not considered an
important UI feature, but the fact that it can be done by a logically
simple operation is likely to result in simpler, less buggy code.

Flag in layer beneath makes reverting the whole operation harder: two
layers need to be modified.

Flag in layer beneath is redundant when overridden by a higher layer, in
other words when a node is present in the WC at this path.

Flag in layer beneath copes with intentional deletion, but what about
'absent' and 'excluded' - it would be wise to be able to support them
too, and maybe we need 'not-present'?  If so, it's a presence value
rather than a Boolean flag.


-----------------------

- Julian

NODE_DATA discussions

Reply via email to