Any responses would be greatly appreciated. - Julian
On Tue, 2010-08-03, Julian Foad wrote: > On Mon, 2010-07-12, Erik Huelsmann wrote: > > After lots of discussion regarding the way NODE_DATA/4th tree should > > be working, I'm now ready to post a summary of the progress. In my > > last e-mail (http://svn.haxx.se/dev/archive-2010-07/0262.shtml) I > > stated why we need this; this post is about the conclusion of what > > needs to happen. Also included are the first steps there. > > > > > > With the advent of NODE_DATA, we distinguish node values specifically > > related to BASE nodes, those specifically related to "current" WORKING > > nodes and those which are to be maintained for multiple levels of > > WORKING nodes (not only the "current" view) (the latter category is > > most often also shared with BASE). > > > > The respective tables will hold the columns shown below. > > > > > > ------------------------- > > TABLE WORKING_NODE ( > > wc_id INTEGER NOT NULL REFERENCES WCROOT (id), > > local_relpath TEXT NOT NULL, > > parent_relpath TEXT, > > moved_here INTEGER, > > moved_to TEXT, > > original_repos_id INTEGER REFERENCES REPOSITORY (id), > > original_repos_path TEXT, > > original_revnum INTEGER, > > translated_size INTEGER, > > last_mod_time INTEGER, /* an APR date/time (usec since 1970) */ > > keep_local INTEGER, > > > > PRIMARY KEY (wc_id, local_relpath) > > ); > > > > CREATE INDEX I_WORKING_PARENT ON WORKING_NODE (wc_id, parent_relpath); > > -------------------------------- > > > > The moved_* and original_* columns are typical examples of "WORKING > > fields only maintained for the visible WORKING nodes": the original_* > > and moved_* fields are inherited from the operation root by all > > children part of the operation. The operation root will be the visible > > change on its own level, meaning it'll have rows both in the > > WORKING_NODE and NODE_DATA tables. The fact that these columns are not > > in the WORKING_NODE table means that tree changes are not preserved > > accros overlapping changes. This is fully compatible with what we do > > today: changes to higher levels destroy changes to lower levels. > > > > The translated_size and last_mod_time columns exist in WORKING_NODE > > and BASE_NODE; they explicitly don't exist in NODE_DATA. The fact that > > they exist in BASE_NODE is a bit of a hack: it's to prevent creation > > of WORKING_NODE data for every file which has keyword expansion or eol > > translation properties set: these columns serve only to optimize > > working copy scanning for changes and as such only relate to the > > visible WORKING_NODEs. > > > > Can we come up with an English description of what each table will now > represent? > > "The BASE_NODE table lists the existing node-revs in the repository that > comprise the mixed-revision tree that was most recently updated/switched > to or checked out. (The kind and content of these nodes is not here; > see the NODE_DATA table.)" > > > TABLE BASE_NODE ( > > wc_id INTEGER NOT NULL REFERENCES WCROOT (id), > > local_relpath TEXT NOT NULL, > > repos_id INTEGER REFERENCES REPOSITORY (id), > > repos_relpath TEXT, > > We need a revision number column here to go along with repos_id and > relpath to make a valid node-rev reference, don't we? > > > parent_relpath TEXT, > > (While we're reorganising, can we move that "parent_relpath" column to > adjacent to "local_relpath"?) > > > translated_size INTEGER, > > last_mod_time INTEGER, /* an APR date/time (usec since 1970) */ > > dav_cache BLOB, > > incomplete_children INTEGER, > > file_external TEXT, > > > > PRIMARY KEY (wc_id, local_relpath) > > ); > > > > "The NODE_DATA table records the kind and shallow content (props, text, > link target) of each node in the WC. It includes both the nodes that > comprise the currently 'visible' (or 'actual' or 'on-disk') state of the > WC and also all nodes that are part of a copied or moved tree but > currently shadowed by a replacement performed inside that tree. > > At least one row exists for each WC path, including paths with no change > and all paths affected by a tree change (add, delete, etc.). If the > same path is affected by multiple levels of tree change - a replacement > inside a copied directory, for example - then multiple rows exist with > different 'op_depth' values." > > > TABLE NODE_DATA ( > > wc_id INTEGER NOT NULL REFERENCES WCROOT (id), > > local_relpath TEXT NOT NULL, > > op_depth INTEGER NOT NULL, > > presence TEXT NOT NULL, > > kind TEXT NOT NULL, > > checksum TEXT, > > changed_rev INTEGER, > > changed_date INTEGER, /* an APR date/time (usec since 1970) */ > > changed_author TEXT, > > The changed_* columns can only belong to a node-rev that exists in the > repository. What node-rev do they belong to and why aren't they > alongside the node-rev details? > > (The changed_* columns convey essentially a rev number and two of the > rev-props associated with that revnum that can be used in keyword > expansions. We should consider representing that information in a more > general form, both to avoid tying the DB format to the choice of those > two particular revprops, and to avoid the redundancy of storing these > same data and author values N times.) > > > > depth TEXT, > > symlink_target TEXT, > > properties BLOB, > > (While we're rearranging, can we group the node-content fields together: > kind, properties, checksum, symlink_target?) > > > PRIMARY KEY (wc_id, local_relpath, oproot) > > s/oproot/op_depth/? > > > ); > > > > CREATE INDEX I_NODE_WC_RELPATH ON NODE_DATA (wc_id, local_relpath); > > > > > > Which leaves the NODE_DATA structure above. The op_depth column > > contains the depth of the node - relative to the wc root - on which > > the operation was run which caused the creation of the given NODE_DATA > > node. In the final scheme (based on single-db), the value will be 0 > > for base and a positive integer for WORKING related data. > > Let's assume single-db. By the last sentence, I understand: For each > BASE_NODE row there is a corresponding NODE_DATA row with 'op_root' = 0; > for every node brought in by a tree operation (copy, move, add) to an > immediate child of the WC root there is a NODE_DATA row with 'op_root' = > 1; for every child of a child ... 2; and so on. > > > - Julian > > > > In order to be able to implement NODE_DATA even without having a fully > > functional SINGLE_DB yet, a transitional node numbering scheme needs > > to be devised. The following numbers will apply: BASE == 0, > > WORKING-this-dir == 1, WORKING-any-immediate-child == 2. > > > > > > Other transitioning related remarks: > > > > * Conditional-protected experimentational sections, just like with > > SINGLE_DB > > * Initial implementation will simply replace the current > > functionality of the 2 tables, from there we can work our way through > > whatever needs doing. > > * Am I forgetting any others? > > > > Bye, > > > > Erik. > >