Re: [PR] Reduce copies in VariableMap implementation [daffodil]

via GitHub Fri, 20 Oct 2023 09:55:11 -0700


mbeckerle commented on code in PR #1097:
URL: https://github.com/apache/daffodil/pull/1097#discussion_r1367226350



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/VariableMap1.scala:
##########
@@ -257,26 +269,18 @@ class VariableMap private (vTable: Map[GlobalQName, 
ArrayBuffer[VariableInstance
    * VariableInstances are mutable and cannot safely be shared across threads
    */
   def copy(): VariableMap = {
-    val table = vTable.map {
-      case (k: GlobalQName, variableInstances: ArrayBuffer[VariableInstance]) 
=> {
-        val newBuf = variableInstances.map { _.copy() }
-        (k, newBuf)
-      }
+    val table = vTable.map { variableInstances =>
+      val newBuf = variableInstances.map { _.copy() }
+      newBuf
     }
 
-    new VariableMap(table)
+    new VariableMap(vrds, table)
   }
 
-  def topLevelInstances(): VariableMap = {
-    val table = vTable.map {
-      case (k: GlobalQName, variableInstances: ArrayBuffer[VariableInstance]) 
=> {
-        val newBuf = new ArrayBuffer[VariableInstance]()
-        newBuf.append(variableInstances.last)
-        (k, newBuf)
-      }
-    }
-
-    new VariableMap(table)
+  def cloneForSuspension(): VariableMap = {
+    val newTable = new Array[Seq[VariableInstance]](vTable.size)
+    Array.copy(vTable, 0, newTable, 0, vTable.size)
+    new VariableMap(vrds, newTable)

Review Comment:
   This is kind of interesting. 
   
   Cloning for suspension doesn't need to copy the whole stack for each 
variable as only the top of stack is relevant to evaluating an expression to 
obtain its value. (.... except then if unparsing resumes from there maybe...?)
   
   Exploring this is beyond the scope of this change set, but we do need to 
think about whether "clone the state" is really needed.
   



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/VariableMap1.scala:
##########
@@ -311,31 +306,33 @@ class VariableMap private (vTable: Map[GlobalQName, 
ArrayBuffer[VariableInstance
    * expression
    */
   def setFirstInstanceInitialValues(): Unit = {
-    vTable.foreach {
-      case (_, variableInstances) => {
-        variableInstances(0).firstInstanceInitialValue = 
variableInstances(0).value
-      }
+    vTable.foreach { variableInstances =>
+      variableInstances.head.firstInstanceInitialValue = 
variableInstances.head.value
     }
   }
 
+  /**
+   * Performance of this is linear in number of variables, this should not be 
used in
+   * performance critical sections.
+   */
   def find(qName: GlobalQName): Option[VariableInstance] = {
-    val optBuf = vTable.get(qName)
-    val variab = {
-      if (optBuf.isDefined)
-        Some(optBuf.get.last)
-      else
-        None
-    }
-    variab
+    getVariableRuntimeData(qName).map { vrd => vTable(vrd.vmapIndex).head }

Review Comment:
   Isn't a hash map of qname -> vmapIndex just a constant table that can be 
computed once we have all the variable declarations? So this can be constant 
time.  Your representation avoids having to call this much as a VRD knows its 
index, so getting from VRD to variable on top of stack is constant time, but I 
think this find can also  be constant time. 
   
   This would only be important if there were more than a trivial number of 
variables. Linear search may in fact be the fastest thing. I think most cases 
the number of variables is < 6. 



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/VariableMap1.scala:
##########
@@ -431,59 +416,57 @@ class VariableMap private (vTable: Map[GlobalQName, 
ArrayBuffer[VariableInstance
     pstate: ParseOrUnparseState,
   ) = {
     val varQName = vrd.globalQName
-    val variableInstances = vTable.get(varQName)
-    if (variableInstances.isDefined) {
-      val variable = variableInstances.get.last
-      variable.state match {
-        case VariableSet => {
-          referringContext.SDE(
-            "Cannot set variable %s twice. State was: %s. Existing value: %s",
-            variable.rd.globalQName,
-            VariableSet,
-            variable.value,
-          )
-        }
+    val variableInstances = vTable(vrd.vmapIndex)
+    val variable = variableInstances.head
+    variable.state match {
+      case VariableSet => {
+        referringContext.SDE(
+          "Cannot set variable %s twice. State was: %s. Existing value: %s",
+          variable.rd.globalQName,
+          VariableSet,
+          variable.value,
+        )
+      }
 
-        case VariableRead => {
+      case VariableRead => {
+
+        /**
+         * TODO: This should be an SDE, but due to a bug (DAFFODIL-1443) in
+         * the way we evaluate escapeSchemes it could lead us to setting the
+         * variable read too early */
+        pstate.SDW(
+          WarnID.VariableSet,
+          "Cannot set variable %s after reading the default value. State was: 
%s. Existing value: %s",
+          variable.rd.globalQName,
+          VariableSet,
+          variable.value,
+        )
+        variable.setValue(newValue)
+        variable.setState(VariableSet)
+      }
 
+      case _ => {
+        vrd.direction match {
           /**
-           * TODO: This should be an SDE, but due to a bug (DAFFODIL-1443) in
-           * the way we evaluate escapeSchemes it could lead us to setting the
-           * variable read too early */
-          pstate.SDW(
-            WarnID.VariableSet,
-            "Cannot set variable %s after reading the default value. State 
was: %s. Existing value: %s",
-            variable.rd.globalQName,
-            VariableSet,
-            variable.value,
-          )
-          variable.setValue(newValue)
-          variable.setState(VariableSet)
-        }
-
-        case _ => {
-          vrd.direction match {
-            /**
-             * Due to potential race conditions regarding the setting of
-             * variables via setVariable and default values in combination with
-             * suspensions during unparsing, we only allow the use of either
-             * setVariable statements or a default value when unparsing a
-             * variable.
-             */
-            case VariableDirection.UnparseOnly | VariableDirection.Both
-                if (vrd.maybeDefaultValueExpr.isDefined && 
variableInstances.get.size > 1) => {
-              // Variable has an unparse direction, a default value, and a
-              // newVariableInstance
-              pstate.SDE(
-                "Variable %s has an unparse direction and a default value, 
setting the variable may cause race conditions when combined with a forward 
referencing expression.",
-                varQName,
-              )
-            }
-            case _ => // Do nothing
+           * Due to potential race conditions regarding the setting of

Review Comment:
   This needs a bit more explanation. What exactly are you saying can cause a 
race here? It's all state for one thread. So how can there be a race? There has 
to be something non-deterministic going on. 
   
   If two suspensions depend on one variable, one is evaluated and gets the 
value before the set variable actually executes... so the default value has 
been consumed before the setVariable runs. When you setVariable it will fail. 
If these can happen in the opposite order, then yes that's a race, and that's 
due to non-deterministic behavior somewhere in the suspension scheduling.
   
   I realize suspensions get evaluated in a humanly non-predictable order, but 
it should be deterministic. 
   
   If not we should make it deterministic so that any given scenario is 
repeatable, and may be incorrect per the setVariable happening after the read 
of the default has already occurred, but this should happen every time or 
never, not sporadically. 
   
   This change set is not about this problem however, so we can park this as a 
separate bug.
   
   Do we check for this and at least warn about it at compile time? To me if 
you use newVariableInstance, the variable declaration should not have a default 
value, I think. 
   



##########
daffodil-core/src/test/scala/org/apache/daffodil/core/util/TestUtils.scala:
##########
@@ -389,7 +388,7 @@ class Fakes private () {
     override def isError: Boolean = false
 
     override def tunables: DaffodilTunables = DaffodilTunables()
-    override def variableMap: VariableMap = VariableMapFactory.create(Nil)
+    override def variableMap: VariableMap = VariableMap(Nil)

Review Comment:
   Ah. I was confusing this with VariableBox, which is associated with unparser 
suspensions in some way. 



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/VariableMap1.scala:
##########
@@ -311,31 +306,33 @@ class VariableMap private (vTable: Map[GlobalQName, 
ArrayBuffer[VariableInstance
    * expression
    */
   def setFirstInstanceInitialValues(): Unit = {
-    vTable.foreach {
-      case (_, variableInstances) => {
-        variableInstances(0).firstInstanceInitialValue = 
variableInstances(0).value
-      }
+    vTable.foreach { variableInstances =>
+      variableInstances.head.firstInstanceInitialValue = 
variableInstances.head.value
     }
   }
 
+  /**
+   * Performance of this is linear in number of variables, this should not be 
used in
+   * performance critical sections.
+   */
   def find(qName: GlobalQName): Option[VariableInstance] = {
-    val optBuf = vTable.get(qName)
-    val variab = {
-      if (optBuf.isDefined)
-        Some(optBuf.get.last)
-      else
-        None
-    }
-    variab
+    getVariableRuntimeData(qName).map { vrd => vTable(vrd.vmapIndex).head }
   }
 
+  /**
+   * Performance of this is linear in number of variables, this should not be 
used in
+   * performance critical sections.
+   */
   def qnames(): Seq[GlobalQName] = {
-    vTable.toSeq.map(_._1)
+    vrds.map { _.globalQName }

Review Comment:
   Isn't this a constant, as in could be lazy val? 



##########
daffodil-core/src/test/scala/org/apache/daffodil/core/util/TestUtils.scala:
##########
@@ -389,7 +388,7 @@ class Fakes private () {
     override def isError: Boolean = false
 
     override def tunables: DaffodilTunables = DaffodilTunables()
-    override def variableMap: VariableMap = VariableMapFactory.create(Nil)
+    override def variableMap: VariableMap = VariableMap(Nil)

Review Comment:
   There was some reason for the factory..... I can't recall. Maybe 
serialization and hashmap related? In any case if tests pass now, and you can 
serialize/deserialize processors it's no longer needed. 



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/VariableMap1.scala:
##########
@@ -233,21 +240,26 @@ final class VariableBox(initialVMap: VariableMap) {
  * The DPath implementation must be made to implement the
  * no-set-after-default-value-has-been-read behavior. This requires that 
reading the variables causes a state transition.
  *
- * The implementation of the VariabeMap uses ArrayBuffers essentially as a
- * stack, as they allow for easy serialization unlike the custom MStack classes
- * we use elsewhere. Scala's mutable Stack is deprecated in 2.12
+ * The VariableMap implementation essentially uses Seq as a stack, as they 
allow for easy
+ * serialization unlike the custom MStack classes we use elsewhere. Note that 
Scala's mutable

Review Comment:
   Does this suggest a revisit (not in this change set) of MStack generally? 
   
   The point of MStack was to avoid allocating objects when pushing value 
classes to/from stacks. If that's not worth it because we encounter overhead 
duplicating the MStacks....  Maybe they should be reimplemented in a simpler 
way or just go away entirely?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Reduce copies in VariableMap implementation [daffodil]

Reply via email to