Re: [PR] Small changes attempting to increase unparse performance [daffodil]

via GitHub Thu, 11 Jun 2026 06:11:16 -0700


stevedlawrence commented on code in PR #1681:
URL: https://github.com/apache/daffodil/pull/1681#discussion_r3395630312



##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/VariableMap1.scala:
##########
@@ -121,7 +121,7 @@ class VariableInstance private (val rd: 
VariableRuntimeData) extends Serializabl
   }
 
   override def toString: String =
-    "VariableInstance(%s,%s,%s,%s)".format(state, value, rd, 
rd.maybeDefaultValueExpr)
+    s"VariableInstance($state,$value,$rd,${rd.maybeDefaultValueExpr})"

Review Comment:
   Does this toString show up in profiling? I wonder if all this information 
isn't really necessary and we could simplify this or the related diagnostics if 
this is shows up a lot? If a particular diagnostics is getting triggered a 
whole bunch, and that diagnostic has an expensive to create string, then maybe 
that could cause some overhead? And my guess is since test files probably 
parse/unparse successfully, then these diagnostics aren't even seen so creating 
these particular strings in these instances is kindof just wasted effort.
   
   Not suggesting we change this as part of this PR, but we might want to 
consider taking a look at at our diagnostics to make sure we aren't creating 
messages greedily. Our `Diagnostic` class lazily builds the message only when 
toString is called. So if we have some diagnostics that are evaluating format 
strings  and always creating the message instead of letting the Diagnostic 
class lazily do it, we are potentially doing a lot of work wasted that might 
never be seen. 



##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/VariableMap1.scala:
##########
@@ -367,15 +365,11 @@ class VariableMap private (
     vrd.direction match {
       case VariableDirection.ParseOnly if (!state.isInstanceOf[PState]) =>
         state.SDE(
-          "Attempting to read variable %s which is marked as parseOnly during 
unparsing".format(
-            varQName
-          )
+          s"Attempting to read variable $varQName which is marked as parseOnly 
during unparsing"

Review Comment:
   THis is kindof an of my previous comment. This could instead be:
   
   ```scala
   state.SDE(
    "Attempting to read variable %s which is marked as parseOnly during 
unparsing",
    varQName
   )
   ```
   So instead of formatting the string, we passing in a format string and args 
to the SDE function and let the SDE build the message when needed.
   
   Note that in this particular case it probably doesn't really matter because 
SDE's are fatal and are always going to be displayed. But in non fatal cases 
(e.g. exceptions that could cause parse errors or unparse suspensions), these 
.format() calls are potentially wasted effort. 



##########
daffodil-core/src/main/scala/org/apache/daffodil/unparsers/runtime1/ElementUnparser.scala:
##########
@@ -298,7 +296,8 @@ class ElementSpecifiedLengthUnparser(
   with RegularElementUnparserStartEndStrategy
   with ElementSpecifiedLengthMixin {
 
-  override def runtimeDependencies = maybeTargetLengthEv.toList.toVector
+  override def runtimeDependencies =

Review Comment:
   Should we consider changing all of our `runtimeDependencies` to a val 
instead of def?
   
   I think the original reasoning for these being a def is that 
`runtimeDependencies` were originally just used in the `ensureCompiled` 
function when we compile schemas. And since it was only ever used at 
compilation (and it should have only been called once per parser) it wasn't 
worth storing in a val. 
   
   But at some point we started accessing it in 
`captureRuntimeValuedExpressionValues`, which happens at runtime when an 
unparser suspends. Which explains why this was showing up in profiling--we're 
probably calling rutimeDependencies alot due to a bunch of suspensions and 
allocating a bunch of Vectors.
   
   So if we change this to a val, it means we can create and store the Vector 
during compilation and avoid Vector allocation overhead at runtime. It also 
means we could stick with the cleaner toList.toVetor because that's a onetime 
hit taken at compile time. Technically it's a waste, but it's probably not that 
big of a deal during schema compilation. A temporary one item list is probably 
not a big deal.
   
   Additionally, should we avoid Vector entirely? Vector is useful as an 
immtuable data structure that allows cheapish update/prepend/append since they 
don't have to copy the whole thing, but we never modified runtimeDependencies. 
We only ever iterate over them. Feels like the best data structure for 
something that we never modify and want to iterate over as quickly as possible 
would be an array.
   
   So maybe we change the runtimeDependencies definitino to 
   ```scala
   val runtimeDependencies: Array[Evaluatable[AnyRef]]
   ``` 
   Which forces all processors to implement it as  val so we can store it and 
avoid runtime allocations, and then this could be something like
   
   ```scala
   override val runtimeDependencies = maybeTargetLengthEv.toList.toArray
   ```
   
   Note, if we wanted we could also add a `toArray` function to Maybe and avoid 
the .toList.toArray, but I'm not sure that's a huge deal.
   
   Not sure if any of this would make a difference, but it does feel like it 
would avoid unnecessary Vector allocations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Small changes attempting to increase unparse performance [daffodil]

Reply via email to