Hyrum, Eric H., Philip M. and I met up at WANdisco's office in Sheffield (England) yeterday and today. One of the things we discussed was NODE_DATA.
We discussed several sub-topics, of which here are two. I've written these up including my further thoughts which weren't part of the discussion, so it's biased. ----------------------- Observation: The new NODE_DATA table can completely subsume the old BASE_NODE and WORKING_NODE tables. BASE_NODE => NODE_DATA(op_depth==0), WORKING_NODE => NODE_DATA(op_depth==max). A few of the columns do not make sense in every op_depth. translated_size and last_mod_time make sense only on the top-most node. But putting these columns into this table seems better than keeping them in a separate table. BASE_NODE NODE_DATA WORKING_NODE ---------------- -------------- -------------- Indexing => yes + op_depth <= Indexing presence => yes <= presence Node-Rev => yes: original_* <= copyfrom_* Content => yes <= Content Last-Change => yes <= Last-Change translated_size => TODO <= translated_size last_mod_time => TODO <= last_mod_time file_external x no - obsolete dav_cache => TODO incomplete_children x no - obsolete no - not ready? <= moved_here no - not ready? <= moved_to no - obsolete x keep_local By "TODO" I mean not yet in wc-metadata.sql. By "not ready?" I mean we're not ready to fully define and use the 'moved_*' columns so it would be better to insert them with a WC format upgrade when we are ready. ----------------------- [RFC] Instead of recording the "deleted" state as a presence value in the topmost operative layer, consider whether to record it by means of a flag in the next-nearest layer *beneath*. This initially sounded appealing, but I'm not so sure now. It could just be a premature optimisation side-track. It's certainly a way down the list of Important Things. Example sequence of operations, showing representation in both schemes: Operation: clean checkout WC paths op_depth=0 op_depth=1 op_depth=0 op_depth=1 ------------ ---------- ---------- ---------- ---------- A1/ norm norm +- f.old norm norm +- f norm norm B1/ norm norm +- f norm norm +- f.new norm norm Operation: delete ./A1 WC paths op_depth=0 op_depth=1 op_depth=0 op_depth=1 ------------ ---------- ---------- ---------- ---------- A1/ norm Del! norm base_deleted +- f.old norm Del! norm base_deleted +- f norm Del! norm base_deleted B1/ norm norm +- f norm norm +- f.new norm norm Operation: copy ^/B1 to ./A1, replacing ^/A1 WC paths op_depth=0 op_depth=1 op_depth=0 op_depth=1 ------------ ---------- ---------- ---------- ---------- A1/ norm Del! norm norm norm +- f.old norm Del! norm base_deleted +- f norm Del! norm norm norm +- f.new norm norm B1/ norm norm +- f norm norm +- f.new norm norm Flag in layer beneath doesn't require a final base_deleted row in the topmost layer. The flag is required for (and only for) paths where a row exists in the previous layer, so it seems more space-efficient to store it there. Flag in layer beneath allows simpler reverting of the "add" half of a replacement: (remove all op_depth=N rows that are children of this op) rather than (convert all these rows to a different presence value that depends on their presence in layer N-1). That's not considered an important UI feature, but the fact that it can be done by a logically simple operation is likely to result in simpler, less buggy code. Flag in layer beneath makes reverting the whole operation harder: two layers need to be modified. Flag in layer beneath is redundant when overridden by a higher layer, in other words when a node is present in the WC at this path. Flag in layer beneath copes with intentional deletion, but what about 'absent' and 'excluded' - it would be wise to be able to support them too, and maybe we need 'not-present'? If so, it's a presence value rather than a Boolean flag. ----------------------- - Julian