Hi, It looks my initial attempt at this didn't work too well, as my intention wasn't clear enough and the interface draft I included seemed to raise mostly concerns about technicalities (too many methods, etc.) instead of the fundamental design tradeoffs I was trying to highlight. So let's try this again.
What I'm looking for is a clear, shared idea of what a jr3 content tree looks like at a low level (i.e. before stuff like node types, etc.) since the current MK interface leaves many of those details unspecified. Here's what the MK interface currently says about this: > * The MicroKernel <b>Data Model</b>: > * <ul> > * <li>simple JSON-inspired data model: just nodes and properties</li> > * <li>a node is represented as an object, consisting of an unordered > collection > * of properties and an array of (child) node objects</li> > * <li>properties are represented as name/value pairs</li> > * <li>MVPs are represented as name/array-of-values pairs</li> > * <li>supported property types: string, number</li> > * <li>other property types (weak/hard reference, date, etc) would need to be > * encoded/mangled in name or value</li> > * <li>no support for JCR/XML-like namespaces, "foo:bar" is just an ordinary > name</li> > * <li>properties and child nodes share the same namespace, i.e. a property > and > * a child node, sharing the same parent node, cannot have the same name</li> > * </ul> There are a few complications and missing details with this model (as documented) that I tried to address in my original proposal. The most notable are: * The data model specifies that a node contains an "an array of (child) node objects" and seems to imply that child nodes are always orderable. This is a major design constraint for the underlying storage model that doesn't seem necessary (a higher-level component could store ordering information explicitly) or desirable (see past discussions on this). To avoid this I think child nodes should be treated as an unordered set of name/node mappings. * Another unspecified bit is whether same-name-siblings need to be supported on the storage level. The MK implies that SNSs are not supported (i.e. a higher level component needs to use things like name mangling to implement SNSs on top of the MK), but the note about "an *array* of (child) node objects" kind of leaves the door open for two child nodes to (perhaps accidentally) have the same name. For also this reason I think child nodes should be treated as a map from names to corresponding nodes. * The data model doesn't specify whether the name of a node is an integral part of the node itself. The implementation(s) clarify (IMHO correctly) that the name of each child node is more logically a part of the parent node. Thus, unlike in JCR, there should be no getName() method on a low-level interface for nodes. * Somewhat contrary to the above, the data model specifies properties as "name/value pairs". The MK interface doesn't allow individual properties to be accessed separately, so this detail doesn't show up too much in practice. However, in terms of an internal API it would be useful to keep properties mostly analogous to child nodes. Thus there should be no getName() method on a low-level interface for properties (or, perhaps more accurately, "values"). * The data model says that "properties and child nodes share the same namespace" but treats properties and child nodes differently in other aspects (properties as "an unordered collection", child nodes as "an array"). This seems like an unnecessary complication that's likely to cause trouble down the line (e.g. where and how will we enforce this constraint?). From an API point of view it would be cleanest either to treat both properties and child nodes equally (like having all as parts of a single unordered set of name/item mappings) or to allow and use a completely separate spaces for property and child node names. * Finally, while the MK interface doesn't spell it out explicitly, the implicit consequence of using MVCC and referencing revision identifiers in method calls is that the underlying tree model is essentially immutable. The content tree only changes when a new revision is constructed, while all past revisions remain intact. To reflect this, an internal tree API should be mostly immutable. These are in my mind the key issues that I think we should try to reach an agreement on. The exact form of the interface that expresses such consensus is IMHO of lesser importance, which is why I don't feel too strongly about things like the use of java.util.Map or the Visitor pattern. Such details can be changed down the line based on experience, but deeper features like addressing and the orderability of nodes and properties are very expensive to change later on. My proposal, as drafted in the Tree interface, essentially says: 1) Properties and child nodes are all addressed using an unordered name->item mapping on the parent node. 2) Neither properties nor child nodes know their own name (or their parent). That information is kept only within the parent node. 3) Content trees are immutable except in clearly documented cases. Some concerns about especially the first and third items were raised in the followup discussion. Based on those concerns, a possible alternative for the first item could be: 1a) Properties are addressed using an unordered name->property mapping on the parent node 1b) Child nodes are addressed using an unordered name->node mapping on the parent node 1c) The spaces for property and child node names are distinct. Possible restrictions on this need to be implemented on a higher level. An alternative for the third item could be: 3a) Content trees are always immutable. 3b) A separate builder API is used to constructing new or modified content trees. Can we reach consensus on some of these models (or yet another alternative)? If yes, it should be fairly straightforward to draft an interface that captures such consensus and addresses the more detailed concerns people have expressed. BR, Jukka Zitting
