Goal: Remove concept of slots from InfosetImpl
Reasons:
1. So that element/node lookup can be name based.
2. Ease re-enabling of Unordered Sequences.
3. Ease enabling of 'query-style' expressions that return more than
one node.
An Infoset is a document (DIDocument) and the document contains exactly
one element (DIElement) as its root. These elements can be simple
(DISimple) or complex (DIComplex). Simple elements do not contain
children, complex elements do. DIArray represents an array (maxOccurs
>1). A DIArray shall always be a child of DIComplex. All of these
inherit from DINode.
Currently DIComplex models its children as an Array of DINodes. The
problem with the current implementation is that there's no way for us to
just ask for an element by name without walking the structure.
The solution involves modifying DIComplex to represent its children both
as a mutable.ArrayBuffer[DINode] and a HashMap[String,
mutable.Seq[DINode]]. The ArrayBuffer shall give us the necessary
insertion ordering information we need regarding the children as well as
constant time indexing access to the child in question. The ArrayBuffer
also has the added benefit of growing and shrinking as objects are added
and removed. While the HashMap will allow us to associate a name
(String) with DINodes that have that name thus providing a fast named
lookup.
This named lookup can be optimized further by only inserting into the
HashMap those elements that we know are used in expressions. These are
the elements that would most likely benefit from a fast named lookup and
will allow us to avoid additional overhead of hashing strings and
allocating/appending to a sequence.
The reason for having the HashMap contain a mutable.Seq[DINode] is that
it's possible for the following:
<element name="a" .../>
<element name="a" maxOccurs="unbounded"..../>
<element name="b".../>
<element name="a" ..../>
Each element named 'a' would have an associated DINode. Because there
are multiple DINodes named 'a', we need to represent them in the HashMap
using the same key 'a'. To do this, we add all instances of 'a' to a
Seq[DINode]. In this particular case, the HashMap would look as follows:
"a" -> mutable.Seq(DIElement, DIArray, DIElement)
"b" -> mutable.Seq(DIElement)
The mutable.ArrayBuffer maintains the insertion order and would look as
follows:
mutable.ArrayBuffer(DIElement, DIArray, DIElement, DIElement)
Back tracking shall be accomplished by going to the last item in the
DIComplex ArrayBuffer and removing it.
1. If the last item is a DIElement, we simply delete it from the
ArrayBuffer and then remove it also from the HashMap.
2. If the last item is a DIArray, we then have to look at the
contents of the DIArray.
a. If the DIArray has more than one entry, remove the last
entry. Then update the associated entry in the ArrayBuffer and HashMap
with the modified DIArray.
b. If the DIArray has a single entry then we simply remove
DIArray from ArrayBuffer and the HashMap.
<element name="root>
<sequence ../>
<element name="a" .../>
<element name="a" maxOccurs="unbounded">
...
<element name="b" .../>
<element name="c" .../>
...
</element>
</sequence>
</element>
If the HashMap of DIComplex-root looks like the following:
"a" -> mutable.Seq(DIElement, DIArray)
ArrayBuffer of DIComplex-root looks like the following:
0 -> DIElement("a")
1 -> DIArray("a")
Where DIArray children ArrayBuffer looks like:
0 -> DIElement("b")
1 -> DIElement("c")
Given the initial example if we backtrack(1): // purge one item
DIArray:
0 -> DIElement("b") // removed one item from DIArray
HashMap DIComplex-root:
"a" -> mutable.Seq(DIElement, DIArray)
ArrayBuffer DIComplex-root:
0 -> DIElement("a")
1 -> DIArray("a")
Given the initial example if we backtrack(2): // purge two items
DIArray:
Empty
HashMap DIComplex-root:
"a" -> mutable.Seq(DIElement) // Because DIArray had two items
// removed and is now empty
ArrayBuffer DIComplex-root:
0 -> DIElement("a")
===
class DIComplex {
def nSlots = erd.nChildSlots
val children: mutable.ArrayBuffer[DINode](nSlots) // insertion order
lazy val fastLookup: HashMap[String, mutable.Seq[DINode]]
def addChild(node: DINode) {
children += node
fastLookup(node.name).append(node)
}
def getChild(name: String): immutable.Seq[DINode] {
fastLookup.getOrElse(name, mutable.Seq.empty) // fastLookup(name)
}
def backtrack(int n) {
for (i = 0; i < n; i++) {
val endIdx = children.size -1
val child = children.remove(endIdx) // children.pop()
fastLookup.get(child.name) match { // fastLookup(child.name).pop()
case None => // do nothing
case Some(nodesSeq) if nodesSeq.size == 0 => // Shouldn't happen?
case Some(nodesSeq) => {
val lastEntry = nodesSeq.last
lastEntry match {
case arr: DIArray => {
// If the array has >= 2 items, remove last one and
update nodesSeq
// with this array:
// val lastIdx = nodesSeq.length - 1
// arr.remove(arr.size - 1)
// fastLookup.update(child.name,
nodesSeq.update(lastIdx, arr))
//
// If the array has 1 item, remove the array from nodesSeq
// fastLookup.update(child.name, nodesSeq.dropRight(1))
}
case _ => fastLookup.update(child.name, nodesSeq.dropRight(1))
}
}
}
}
}
}
class DIArray {
private val initialSize =
parent.tunable.initialElementOccurrencesHint.toInt // set to 10
protected final val _contents = new ArrayBuffer[DIElement](initialSize)
def append(ie: InfosetElement): Unit = {
_contents.append(ie.asInstanceOf[DIElement])
ie.setArray(this)
}
//
// For backtracking purposes
//
def remove(n: Int): DIElement = {
_contents.remove(n)
}
}
===
To facillitate a breadth-first ordering scheme we shall have to
construct a 'global' that lives on the Element Runtime Data (ERD). It
shall be called childElementDeclPosition and pertains to the static
order of the element children of a complex type element. When sorting
DINodes, childElementDeclPosition can be obtained by way of reaching
from the DINode to the coresponding ERD. Any array shall be a single
entry in the sequence that needs to be sorted. All elements in this
array will have the same childElementDeclPosition.
The difficulty in implementing this is that an Element's
dsom.ComplexType can have groups within the groups before finding actual
element children. Therefore in the Document Schema Object Model (DSOM)
the model group object shall have a startingChildElementDeclPosition and
an endingChildElementDeclPosition.
1. The startingChildElementPosition shall be passed to the constructor
for the Term.
2. If the term is an element, the endingChildElementDeclPosition is
equal to the startingChildElementDeclPosition. 3. If the term is a
model group, then its first child term's
startingChildElementDeclPosition is equal to its own
startingChildElementDeclPosition, the 2nd to Nth child term's
startingChildElementDeclPosition is the prior term's
endingChildElementDeclPosition + 1.
4. The endingChildElementDeclPosition for a model-group shall be the
endingChildElmentDeclPosition of its last child term.