stevedlawrence commented on code in PR #1681:
URL: https://github.com/apache/daffodil/pull/1681#discussion_r3395630312
##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/VariableMap1.scala:
##########
@@ -121,7 +121,7 @@ class VariableInstance private (val rd:
VariableRuntimeData) extends Serializabl
}
override def toString: String =
- "VariableInstance(%s,%s,%s,%s)".format(state, value, rd,
rd.maybeDefaultValueExpr)
+ s"VariableInstance($state,$value,$rd,${rd.maybeDefaultValueExpr})"
Review Comment:
Does this toString show up in profiling? I wonder if all this information
isn't really necessary and we could simplify this or the related diagnostics if
this is shows up a lot? If a particular diagnostics is getting triggered a
whole bunch, and that diagnostic has an expensive to create string, then maybe
that could cause some overhead? And my guess is since test files probably
parse/unparse successfully, then these diagnostics aren't even seen so creating
these particular strings in these instances is kindof just wasted effort.
Not suggesting we change this as part of this PR, but we might want to
consider taking a look at at our diagnostics to make sure we aren't creating
messages greedily. Our `Diagnostic` class lazily builds the message only when
toString is called. So if we have some diagnostics that are evaluating format
strings and always creating the message instead of letting the Diagnostic
class lazily do it, we are potentially doing a lot of work wasted that might
never be seen.
##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/VariableMap1.scala:
##########
@@ -367,15 +365,11 @@ class VariableMap private (
vrd.direction match {
case VariableDirection.ParseOnly if (!state.isInstanceOf[PState]) =>
state.SDE(
- "Attempting to read variable %s which is marked as parseOnly during
unparsing".format(
- varQName
- )
+ s"Attempting to read variable $varQName which is marked as parseOnly
during unparsing"
Review Comment:
THis is kindof an of my previous comment. This could instead be:
```scala
state.SDE(
"Attempting to read variable %s which is marked as parseOnly during
unparsing",
varQName
)
```
So instead of formatting the string, we passing in a format string and args
to the SDE function and let the SDE build the message when needed.
Note that in this particular case it probably doesn't really matter because
SDE's are fatal and are always going to be displayed. But in non fatal cases
(e.g. exceptions that could cause parse errors or unparse suspensions), these
.format() calls are potentially wasted effort.
##########
daffodil-core/src/main/scala/org/apache/daffodil/unparsers/runtime1/ElementUnparser.scala:
##########
@@ -298,7 +296,8 @@ class ElementSpecifiedLengthUnparser(
with RegularElementUnparserStartEndStrategy
with ElementSpecifiedLengthMixin {
- override def runtimeDependencies = maybeTargetLengthEv.toList.toVector
+ override def runtimeDependencies =
Review Comment:
Should we consider changing all of our `runtimeDependencies` to a val
instead of def?
I think the original reasoning for these being a def is that
`runtimeDependencies` were originally just used in the `ensureCompiled`
function when we compile schemas. And since it was only ever used at
compilation (and it should have only been called once per parser) it wasn't
worth storing in a val.
But at some point we started accessing it in
`captureRuntimeValuedExpressionValues`, which happens at runtime when an
unparser suspends. Which explains why this was showing up in profiling--we're
probably calling rutimeDependencies alot due to a bunch of suspensions and
allocating a bunch of Vectors.
So if we change this to a val, it means we can create and store the Vector
during compilation and avoid Vector allocation overhead at runtime. It also
means we could stick with the cleaner toList.toVetor because that's a onetime
hit taken at compile time. Technically it's a waste, but it's probably not that
big of a deal during schema compilation. A temporary one item list is probably
not a big deal.
Additionally, should we avoid Vector entirely? Vector is useful as an
immtuable data structure that allows cheapish update/prepend/append since they
don't have to copy the whole thing, but we never modified runtimeDependencies.
We only ever iterate over them. Feels like the best data structure for
something that we never modify and want to iterate over as quickly as possible
would be an array.
So maybe we change the runtimeDependencies definitino to
```scala
val runtimeDependencies: Array[Evaluatable[AnyRef]]
```
Which forces all processors to implement it as val so we can store it and
avoid runtime allocations, and then this could be something like
```scala
override val runtimeDependencies = maybeTargetLengthEv.toList.toArray
```
Note, if we wanted we could also add a `toArray` function to Maybe and avoid
the .toList.toArray, but I'm not sure that's a huge deal.
Not sure if any of this would make a difference, but it does feel like it
would avoid unnecessary Vector allocations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]