mbeckerle commented on code in PR #1097:
URL: https://github.com/apache/daffodil/pull/1097#discussion_r1367226350
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/VariableMap1.scala:
##########
@@ -257,26 +269,18 @@ class VariableMap private (vTable: Map[GlobalQName,
ArrayBuffer[VariableInstance
* VariableInstances are mutable and cannot safely be shared across threads
*/
def copy(): VariableMap = {
- val table = vTable.map {
- case (k: GlobalQName, variableInstances: ArrayBuffer[VariableInstance])
=> {
- val newBuf = variableInstances.map { _.copy() }
- (k, newBuf)
- }
+ val table = vTable.map { variableInstances =>
+ val newBuf = variableInstances.map { _.copy() }
+ newBuf
}
- new VariableMap(table)
+ new VariableMap(vrds, table)
}
- def topLevelInstances(): VariableMap = {
- val table = vTable.map {
- case (k: GlobalQName, variableInstances: ArrayBuffer[VariableInstance])
=> {
- val newBuf = new ArrayBuffer[VariableInstance]()
- newBuf.append(variableInstances.last)
- (k, newBuf)
- }
- }
-
- new VariableMap(table)
+ def cloneForSuspension(): VariableMap = {
+ val newTable = new Array[Seq[VariableInstance]](vTable.size)
+ Array.copy(vTable, 0, newTable, 0, vTable.size)
+ new VariableMap(vrds, newTable)
Review Comment:
This is kind of interesting.
Cloning for suspension doesn't need to copy the whole stack for each
variable as only the top of stack is relevant to evaluating an expression to
obtain its value. (.... except then if unparsing resumes from there maybe...?)
Exploring this is beyond the scope of this change set, but we do need to
think about whether "clone the state" is really needed.
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/VariableMap1.scala:
##########
@@ -311,31 +306,33 @@ class VariableMap private (vTable: Map[GlobalQName,
ArrayBuffer[VariableInstance
* expression
*/
def setFirstInstanceInitialValues(): Unit = {
- vTable.foreach {
- case (_, variableInstances) => {
- variableInstances(0).firstInstanceInitialValue =
variableInstances(0).value
- }
+ vTable.foreach { variableInstances =>
+ variableInstances.head.firstInstanceInitialValue =
variableInstances.head.value
}
}
+ /**
+ * Performance of this is linear in number of variables, this should not be
used in
+ * performance critical sections.
+ */
def find(qName: GlobalQName): Option[VariableInstance] = {
- val optBuf = vTable.get(qName)
- val variab = {
- if (optBuf.isDefined)
- Some(optBuf.get.last)
- else
- None
- }
- variab
+ getVariableRuntimeData(qName).map { vrd => vTable(vrd.vmapIndex).head }
Review Comment:
Isn't a hash map of qname -> vmapIndex just a constant table that can be
computed once we have all the variable declarations? So this can be constant
time. Your representation avoids having to call this much as a VRD knows its
index, so getting from VRD to variable on top of stack is constant time, but I
think this find can also be constant time.
This would only be important if there were more than a trivial number of
variables. Linear search may in fact be the fastest thing. I think most cases
the number of variables is < 6.
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/VariableMap1.scala:
##########
@@ -431,59 +416,57 @@ class VariableMap private (vTable: Map[GlobalQName,
ArrayBuffer[VariableInstance
pstate: ParseOrUnparseState,
) = {
val varQName = vrd.globalQName
- val variableInstances = vTable.get(varQName)
- if (variableInstances.isDefined) {
- val variable = variableInstances.get.last
- variable.state match {
- case VariableSet => {
- referringContext.SDE(
- "Cannot set variable %s twice. State was: %s. Existing value: %s",
- variable.rd.globalQName,
- VariableSet,
- variable.value,
- )
- }
+ val variableInstances = vTable(vrd.vmapIndex)
+ val variable = variableInstances.head
+ variable.state match {
+ case VariableSet => {
+ referringContext.SDE(
+ "Cannot set variable %s twice. State was: %s. Existing value: %s",
+ variable.rd.globalQName,
+ VariableSet,
+ variable.value,
+ )
+ }
- case VariableRead => {
+ case VariableRead => {
+
+ /**
+ * TODO: This should be an SDE, but due to a bug (DAFFODIL-1443) in
+ * the way we evaluate escapeSchemes it could lead us to setting the
+ * variable read too early */
+ pstate.SDW(
+ WarnID.VariableSet,
+ "Cannot set variable %s after reading the default value. State was:
%s. Existing value: %s",
+ variable.rd.globalQName,
+ VariableSet,
+ variable.value,
+ )
+ variable.setValue(newValue)
+ variable.setState(VariableSet)
+ }
+ case _ => {
+ vrd.direction match {
/**
- * TODO: This should be an SDE, but due to a bug (DAFFODIL-1443) in
- * the way we evaluate escapeSchemes it could lead us to setting the
- * variable read too early */
- pstate.SDW(
- WarnID.VariableSet,
- "Cannot set variable %s after reading the default value. State
was: %s. Existing value: %s",
- variable.rd.globalQName,
- VariableSet,
- variable.value,
- )
- variable.setValue(newValue)
- variable.setState(VariableSet)
- }
-
- case _ => {
- vrd.direction match {
- /**
- * Due to potential race conditions regarding the setting of
- * variables via setVariable and default values in combination with
- * suspensions during unparsing, we only allow the use of either
- * setVariable statements or a default value when unparsing a
- * variable.
- */
- case VariableDirection.UnparseOnly | VariableDirection.Both
- if (vrd.maybeDefaultValueExpr.isDefined &&
variableInstances.get.size > 1) => {
- // Variable has an unparse direction, a default value, and a
- // newVariableInstance
- pstate.SDE(
- "Variable %s has an unparse direction and a default value,
setting the variable may cause race conditions when combined with a forward
referencing expression.",
- varQName,
- )
- }
- case _ => // Do nothing
+ * Due to potential race conditions regarding the setting of
Review Comment:
This needs a bit more explanation. What exactly are you saying can cause a
race here? It's all state for one thread. So how can there be a race? There has
to be something non-deterministic going on.
If two suspensions depend on one variable, one is evaluated and gets the
value before the set variable actually executes... so the default value has
been consumed before the setVariable runs. When you setVariable it will fail.
If these can happen in the opposite order, then yes that's a race, and that's
due to non-deterministic behavior somewhere in the suspension scheduling.
I realize suspensions get evaluated in a humanly non-predictable order, but
it should be deterministic.
If not we should make it deterministic so that any given scenario is
repeatable, and may be incorrect per the setVariable happening after the read
of the default has already occurred, but this should happen every time or
never, not sporadically.
This change set is not about this problem however, so we can park this as a
separate bug.
Do we check for this and at least warn about it at compile time? To me if
you use newVariableInstance, the variable declaration should not have a default
value, I think.
##########
daffodil-core/src/test/scala/org/apache/daffodil/core/util/TestUtils.scala:
##########
@@ -389,7 +388,7 @@ class Fakes private () {
override def isError: Boolean = false
override def tunables: DaffodilTunables = DaffodilTunables()
- override def variableMap: VariableMap = VariableMapFactory.create(Nil)
+ override def variableMap: VariableMap = VariableMap(Nil)
Review Comment:
Ah. I was confusing this with VariableBox, which is associated with unparser
suspensions in some way.
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/VariableMap1.scala:
##########
@@ -311,31 +306,33 @@ class VariableMap private (vTable: Map[GlobalQName,
ArrayBuffer[VariableInstance
* expression
*/
def setFirstInstanceInitialValues(): Unit = {
- vTable.foreach {
- case (_, variableInstances) => {
- variableInstances(0).firstInstanceInitialValue =
variableInstances(0).value
- }
+ vTable.foreach { variableInstances =>
+ variableInstances.head.firstInstanceInitialValue =
variableInstances.head.value
}
}
+ /**
+ * Performance of this is linear in number of variables, this should not be
used in
+ * performance critical sections.
+ */
def find(qName: GlobalQName): Option[VariableInstance] = {
- val optBuf = vTable.get(qName)
- val variab = {
- if (optBuf.isDefined)
- Some(optBuf.get.last)
- else
- None
- }
- variab
+ getVariableRuntimeData(qName).map { vrd => vTable(vrd.vmapIndex).head }
}
+ /**
+ * Performance of this is linear in number of variables, this should not be
used in
+ * performance critical sections.
+ */
def qnames(): Seq[GlobalQName] = {
- vTable.toSeq.map(_._1)
+ vrds.map { _.globalQName }
Review Comment:
Isn't this a constant, as in could be lazy val?
##########
daffodil-core/src/test/scala/org/apache/daffodil/core/util/TestUtils.scala:
##########
@@ -389,7 +388,7 @@ class Fakes private () {
override def isError: Boolean = false
override def tunables: DaffodilTunables = DaffodilTunables()
- override def variableMap: VariableMap = VariableMapFactory.create(Nil)
+ override def variableMap: VariableMap = VariableMap(Nil)
Review Comment:
There was some reason for the factory..... I can't recall. Maybe
serialization and hashmap related? In any case if tests pass now, and you can
serialize/deserialize processors it's no longer needed.
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/VariableMap1.scala:
##########
@@ -233,21 +240,26 @@ final class VariableBox(initialVMap: VariableMap) {
* The DPath implementation must be made to implement the
* no-set-after-default-value-has-been-read behavior. This requires that
reading the variables causes a state transition.
*
- * The implementation of the VariabeMap uses ArrayBuffers essentially as a
- * stack, as they allow for easy serialization unlike the custom MStack classes
- * we use elsewhere. Scala's mutable Stack is deprecated in 2.12
+ * The VariableMap implementation essentially uses Seq as a stack, as they
allow for easy
+ * serialization unlike the custom MStack classes we use elsewhere. Note that
Scala's mutable
Review Comment:
Does this suggest a revisit (not in this change set) of MStack generally?
The point of MStack was to avoid allocating objects when pushing value
classes to/from stacks. If that's not worth it because we encounter overhead
duplicating the MStacks.... Maybe they should be reimplemented in a simpler
way or just go away entirely?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]