Here's my concrete proposal to fix DataProcesor's API.

We add a clone() method. It copies the object sharing the big/expensive stuff 
like the compiled schema but gives you a separate copy of all the state that is 
settable via setValidationMode, setExternalDFDLVariable, and setTunable.

So you get a DataProcessor either from a ProcessorFactory, or from 
Compiler.reload().

You set state.
You can call parse/unparse on various threads

If you want to change state settings for another parse call, you must call 
clone() to get another instance of the DataProcessor. This will share the 
expensive/big stuff like the compiled schema, but give you a separate copy of 
all the state that is settable via setValidationMode, setExternalDFDLVariable, 
and setTunable.

Then you set state
You can call parse/unparse on various threads.

I hate to get into locking and such, but once you call parse/unparse, that 
should lock the object state, and any further setter calls should fail/throw. 
You should have to clone it to "change" the settings.

This makes the use of the setters a java-ish way of achieving the same 
reentrancy one would get if there were no setters and simply more arguments to 
the parse/unparse calls.

________________________________
From: Beckerle, Mike <mbecke...@tresys.com>
Sent: Wednesday, March 25, 2020 11:29 AM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged

To respond to my own thread here, I think given that we allow multiple 
simultaneous calls to parse/unparse from different threads, we must say that 
the DataProcessor object is immutable once parse or unparse are called.

I suppose we could say that it is mutable, but behavior is undetermined if any 
parse or unparse calls are active on any thread.

But this is just asking for trouble IMHO.

I think we started out with a stateful, non-thread capable API. The idea is 
that one thread would be invoking a data processor at a time. A data procesor 
was the state-block of an execution.

The need to share compiled processor reloads, because the compile schemas were 
expensive to create, tempted us to allow multiple parse/unparse calls on 
different threads.

Fact is, I think we should have said no to this, provided a 
DataProcessor.clone() to create instances that shared the reloaded compiled 
schema binary, but otherwise had separate state, and said that parse/unparse 
were synchronized methods on their DP instance.

Instead we're in a 1/2 way world where we don't have a thread-reasonable API 
due to mutable state in what turns out to be a cross-thread shared object.

________________________________
From: Beckerle, Mike <mbecke...@tresys.com>
Sent: Wednesday, March 25, 2020 10:55 AM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Compiler.setExternalDFDLVariable(s) considered challenged

Why does the API for Daffodil have

Compiler.setExternalDFDLVariable(...)
and
Compiler.setExternalDFDLVariables(...)

on it.

I believe we should deprecate this.

Compilers are parameterized by some of the tunables I understand.

But the external DFDL variables? These cannot affect compilation. The schema 
compiler needs to know statically the information about variables found in the 
schema itself in the dfdl:defineVariable statement annotations.

But the compiler doesn't need external variable bindings. In fact if it did 
know and use them, it would be building assumptions into the compiled schema 
that it shouldn't be building in.

Setting external var bindings on the Compiler just causes problems when that 
compiler instance is reused in a context where those settings aren't 
appropriate. (JIRA DAFFODIL-2302 is one such problem)

I believe setExternalDFDLVariable(s) methods should be deprecated, and external 
variables bindings should be an optional argument to the parse/unparse methods.

The setters cause thread safety issues because the DP is stateful, even though 
we want multiple calls to parse/unparse to be executable on different threads.

Consider: if we allow ordinary setExternalDFDLVariables and add a 
resetExternalDFDLVariables to clear them, then imagine one wants to make two 
parse calls on separate threads with different external variables bindings:

so on main thread.....
     dp.setExternalVariables(...bindings 1...)
     spawn the thread 1
on the thread 1
     dp.parse(....)
back on main thread
     dp.resetExternalVariables() // race condition. Did the parse call read the 
external variables before this reset or not?
     dp.setExternalVariables(...bindings 2....)
    .....

However, if we make the external variable bindings an argument to parse, we 
avoid all of this.

Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit 
multiple calls on the same DataProcessor object simultaneously. We can provide 
a clone() method that preserves the loaded/reloaded processor, but constructs 
another DataProcessor object, thereby allowing separate external variables 
state per DataProcessor instance.


Comments?





Mike Beckerle | Principal Engineer
[Owl Cyber Defense]<http://owlcyberdefense.com>
[cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of 
Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/>
P +1-781-330-0412
W owlcyberdefense.com<http://www.owlcyberdefense.com>

Reply via email to