Re: Thoughts on on demand copying of parser state

Mike Beckerle Tue, 09 Jan 2024 13:03:47 -0800

Actually, I haven't measured it, but there are 4 built in variables, so
even if a schema introduces no new variables of its own there is overhead
to deal with copying the state of 4 variables just in case you need to
backtrack them, and this overhead occurs for every point of uncertainty.

Also more and more schemas are using variables. We're finding them very
very useful.

Nevertheless I think the vast bulk of points of uncertainty will come and
go with no variables being touched. They tend to get used for specific
things, but not all over the place.

For example, several schemas have a feature to capture bad data into a
hexBinary Blob element so as to be able to keep parsing a large file,
instead of failing on the first bad data item.
Whether they do this or just fail is controlled by a variable. But that
variable is not touched unless legal parsing fails. So one would hope the
vast bulk of the data processing would never touch that variable, yet every
single record in the data file is a point of uncertainty.

On Tue, Jan 9, 2024 at 1:49 PM Larry Barber <larry.bar...@nteligen.com>
wrote:

> Seems like the benefit would only be significant if you were dealing with
> lots of variables.
>
> -----Original Message-----
> From: Mike Beckerle <mbecke...@apache.org>
> Sent: Tuesday, January 9, 2024 1:39 PM
> To: dev@daffodil.apache.org
> Subject: Thoughts on on demand copying of parser state
>
> Right now we copy the state of the parser as every point of uncertainty is
> reached.
>
> I am speculating that we could copy on demand. So, for example, if no
> variable modifying operation occurs then there would be no overhead  to
> copy the variable state.
>
> This comes at the cost of each variable doing an additional test of
> whether the variable state needs to be copied first.
>
> Thoughts?
>

Re: Thoughts on on demand copying of parser state

Reply via email to