Erik and I chatted on IM about the NODE_DATA table (aka "4th tree"). Figured it would be a good thing to capture that here to the dev@ list. Below is our chat, with only a few (personal) redactions.
There is some more conversation, which I'll forward separately... Cheers, -g ---------- Forwarded message ---------- From: Erik Huelsmann Date: Mon, Jul 5, 2010 at 16:38 Subject: Chat with Erik Huelsmann To: [email protected] [...] Erik: From the conversation the past weeks, I see there are 3 big items remaining for 1.7; one of which is the "4th tree" 15:45 me: yup Erik: I was pondering the subject, but thought I'd tell you where my sudden interest comes from before jumping into the porcelain me: hehe search your archives for NODE_DATA that'll provide msot of the basics of the thinking 15:46 my last note said something about including copyfrom data in the NODE_DATA table, but I'm nuking that idea Erik: I read up on it this afternoon, or at least quite a bit of it. ok. I think there are 2 ways of viewing NODE_DATA: 1. as a stash for 'non-current' layers 15:47 2. as a table which holds all layers including the current one from what I read about it, your thinking is (2)? me: yes the data moves from both BASE_NODE and WORKING_NODE into that table, 15:48 and with particular queries, you get the "latest" node of the tree latest/most-current/topmost-layer/whatever Erik: k. if we make mistakes clearing out the table, that's probably the best way to notice early :-) 15:49 me: :-) 15:50 Erik: About using it for BASE_NODE as well as WORKING: there's no intention to share records between copied parts of the tree though, right? I mean, it'll all still be keyed on the local_relpath me: correct Erik: ok. because if not, I was expecting issues with e.g. presence me: yah. not even gonna try that. rows will be copied when a copy/move occurs. 15:51 Erik: Ok. Mind me writing up some of the thoughts in a mail? It should end up being a proposal for change of the current schema. 15:52 me: please go ahead! sure Erik: How far away do you expect yourself to be from moving to something other than cleaning svn_wc__get_entry()? 15:53 me: I've got a few days of writing in-db props tests, then to bump that format, then to work on NODE_DATA 15:54 one complication is entries upgrading, we have no more entry_modify() calls, but we still have to write old entries into the db, and that is done (today) using sql statements, which will need to switch over to updating NODE_DATA as appropraite 15:56 Erik: ok. and from Bert, I understood that's the same time when we get feature parity with 1.6 (ie being able to replace parts of the tree) me: yah. we have a couple problems with adds-under-copy. a couple other sequences. 15:57 Erik: philip was expecting problems from the "multi-copysource" paradigm used to model mixed-rev WCs. especially because you can't tell they're part of the same op. do you see that differently? 15:58 me: I had thought to put the copyfrom_relpath/rev into the NODE_DATa table to do mixed revs under one op, but am going back on that idea, and sticking to multiple ops in WORKING_NODE, 15:59 where each op specifies a different rev, and yes... that will cause problems to detect "single op", but that model is what we need for *commit* time, Erik: the question should probably be "do we need to". right. me: because when committing, we issue a new COPY for each operation in the WORKING table, 16:00 and so... yah. "fine. it looks like different ops.", but do we care? it is entirely possible that a person ended up in that state with TRUE multiple operations, or it is possible to reach taht state from a single mixed-rev copy, Erik: when looking at it from a commit point of view, everything is part of the same -yet uncreated - transaction, I guess. me: and I think it is important to NOT contain that kind of history, yes but the biggest user-visible feature, is "revert", 16:01 because you can only revert at operation roots, not children of those, so a mixed-rev copy will create multiple operation roots, which can then be independently reverted, but this can cause a problem because the ancestor node that has a different revision, Erik: do we need elision for those? What if everything is updated to the same rev? me: doesn't have the now-reverted descendent at that ancestor revision, 16:02 so when reverting in this situation, and there are no other layers of NODE_DATA to provide the data, then you have to mark the node as excluded, so parent is r5, child is r7, and you revert the child, it now becomes an r5/excluded child, but even then... that might not be quite right because the child was created in r6, 16:03 so maybe it just becomes a not-present node... Erik: what happens to the excluded/not-present nodes during a commit? Do they get copied with the parent, if in the repos? Or are they deleted, if present in the repos? me: post-copy, an update will not unify the revisions. they are still copies of distinct revisions copied with the parent 16:04 consider two operations: svn cp A newA ; svn cp B newA/B well... insert an 'svn rm newA/B' in between 16:05 if newA/B is reverted, then the commit has a copy of A including all of its children Erik: right. me: now... if you reach that same state via a single mixed-rev copy, then you revert the newA/B "copy", then a commit should contain all of newA (well, as a copy of a...@some-rev) 16:06 iow, history says whether you have child data after that child-revert, and you don't in a mixed-rev copy, so you have to leave something there. I think that is not-present 16:07 Erik: it's not excluded (because that assumes existence) or deleted (same reason) 16:08 so, you need something which says "it might or might not be here in the repos, but I don't have it" me: which is not-present. we report not-present to the server, and it will send stuff if something should be there. or it will NOT send something, and we remove the not-present node in that case. Erik: sounds like absent, although currently that may assume existence too. me: no. absent is a misnomer for not-authorized. 16:09 that's on a todo list for renaming. 16:10 Erik: ok. I get the context. Let's see if I can get something in a mail about it. me: not-present basically means "this is a versioned node at *some*revision. we don't have its details here right now." 16:11 it primarily appears when you commit the deletion of a file. the parent's rev implies its existence, but post-commit-rev it does not exist. the parent has to have some kind of marker about the child, and that is not-present 16:12 Erik: hmm. does our editor have this notion? 16:13 me: the not-present concept is part of the Reporter, not the Editor we report not-present nodes to the server. when the server gives us stuff to update our working copy, the editor will put a node there, or say nothing about it. 16:15 (and as I said, post-update, we remove all not-present nodes; if the server said nothing about them, then they are not part of the target revision, so we can safely remove them) now... part of the issue is that we're talking about copy/move children, 16:16 rather than nodes living in the BASE tree, so I'd advise creating a new presence for these, for clarity sake. in fact, during a conversation at some point, I suggested expanding the set of presence values. include copy-here/move-here/moved-away into the present set, I erroneously thought it Good to keep a minimal set, 16:17 but that isn't really true. and it means we need to do a scan_addition/scan_deletion to get data, when we may only need the status that is easily derived from the presence value (obviously, you always have to scan for an operation root; tho... with NODE_DATA and the op_depth, the scan is more of a *skip* :-) ) 16:18 Erik: sounds sane. we might want to remove the scans by expanding the presence set if NODE_DATA is going quickly enough. 16:19 [...] me: yeah Erik: but it would be nice to carve out NODE_DATA before then. [...] me: anyways. yeah. in the course of adding NODE_DATA, then we can also expand the set of presence values to assist with various types of data lookups 16:26 (scanning will still be necessary, but we may be able to improve the algorithm) 16:28 Erik: to recap: NODE_DATA contains the information to link a BASE_NODE or WORKING_NODE to its repository location. me: no 16:29 only BASE nodes have repository locations. the table has about eight columns. I listed those out in one of the emails. 16:31 kind, [checksum], changed_*, properties, [symlink_target] 7 columns of data. and then the key is <wc_id, local_relpath, op_depth> 16:32 we may also want to put translated_size and last_mod_time into NODE_DATA s/may/probably/

