Re: Thoughts on on demand copying of parser state

Steve Lawrence Tue, 09 Jan 2024 14:34:19 -0800

And here's where we do some logic and a more detailed comment about it:


https://github.com/apache/daffodil/blob/main/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/PState.scala#L346-L362

So I think we do already do copy-on-write for variables when parsing.


On 2024-01-09 05:28 PM, Steve Lawrence wrote:

There's actually a comment in the PState captureFrom() method used tocapture state during PoUs:


// Note that this is intentionally a shallow copy. This normally would
// not work because the variable map is mutable so other state changes
// could mutate this snapshot. This is avoided by carefully changing the
// PState variable map to a deep copy of this variable map right before a
// change is made. This essentially makes the PState variable map behave
// as copy-on-write.
this.variableMap = ps.variableMap

Assuming that is all true and done correctly, we might actually alreadydo what you suggest, at least for variables. But there might be otherparts of PState that we that woudl improve performance by changing tocopy-on-write. We may want to do some profiling on formats with lots ofPoUs to see if anything shows up.



On 2024-01-09 04:03 PM, Mike Beckerle wrote:

Actually, I haven't measured it, but there are 4 built in variables, so
even if a schema introduces no new variables of its own there is overhead
to deal with copying the state of 4 variables just in case you need to
backtrack them, and this overhead occurs for every point of uncertainty.

Also more and more schemas are using variables. We're finding them very
very useful.

Nevertheless I think the vast bulk of points of uncertainty will come and
go with no variables being touched. They tend to get used for specific
things, but not all over the place.

For example, several schemas have a feature to capture bad data into a
hexBinary Blob element so as to be able to keep parsing a large file,
instead of failing on the first bad data item.
Whether they do this or just fail is controlled by a variable. But that
variable is not touched unless legal parsing fails. So one would hope the

vast bulk of the data processing would never touch that variable, yetevery

single record in the data file is a point of uncertainty.

On Tue, Jan 9, 2024 at 1:49 PM Larry Barber <larry.bar...@nteligen.com>
wrote:

Seems like the benefit would only be significant if you were dealingwith

lots of variables.

-----Original Message-----
From: Mike Beckerle <mbecke...@apache.org>
Sent: Tuesday, January 9, 2024 1:39 PM
To: dev@daffodil.apache.org
Subject: Thoughts on on demand copying of parser state

Right now we copy the state of the parser as every point ofuncertainty is

reached.

I am speculating that we could copy on demand. So, for example, if no
variable modifying operation occurs then there would be no overhead  to
copy the variable state.

This comes at the cost of each variable doing an additional test of
whether the variable state needs to be copied first.

Thoughts?

Re: Thoughts on on demand copying of parser state

Reply via email to