My one minor concern about the locking after parse/unparse is that it
changes behavior, and could potentially break things (I assume use after
lock is an assertion of some kind?).

It's definitely possible to use the DataProcessor in a safe manner while
mutating the state currently if someone is careful, so to not allow
mutating it after calling parse/unparse could potentially break someones
existing use of Daffodil.

What are your thoughts on @deprecating those functions, the deprecation
message can be something like "This is not thread-safe, be very careful
if you do not use it". And rather than locking and asserting, we could
just create a warning and say "You're using this in a potentially
dangerous manner, you should use the new API"?

- Steve

On 3/25/20 3:35 PM, Beckerle, Mike wrote:
> A fully functional API is probably good to have, and I'd like to define that 
> also and cut over to using that within Daffodil.
> 
> We would need a dataProcessor.reset() method that creates a new dp which has 
> had all external vars, tunables, basically any of those formerly settable 
> things, forgotten so you can start clean.
> 
> (Would clean() be a better name than reset() ?)
> 
> This reset() subsumes my clone() method idea.
> 
> I want to deprecate (literally with "@deprecated" annotations) the setters 
> generally, but my compromise is once you parse/unparse the object becomes 
> immutable permanently. So the setters are just a different way to 
> parameterize the object.
> 
> The other thing that is impacted by this change it the 
> OpenDFDL/ibmCrossTester that implements these same APIs but drives IBM DFDL 
> with them.  That's ok for now. I can retrofit that later. It will still work, 
> just get deprecation warnings.
> 
> ________________________________
> From: Steve Lawrence <stephen.d.lawre...@gmail.com>
> Sent: Wednesday, March 25, 2020 12:05 PM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
> 
> This feels like the right approach. Thoughts on instead of having to
> manually clone() and having some sort of locking mechanism when parse is
> called, we just make a DataProcessor completely immutable, and setting a
> variable will implicitly create a copy with modified state, something like:
> 
> val dp = dataProcessor
>            .withTunable("tunableFoo", "1")
>            .withVariable("variableFoo", 2")
>            .withVariable("variableBar", 2")
> 
> It's not as java-y, but feels safer. I can imagine a case where someone
> sets a bunch of state, calls parse and locks things, and then way down
> the line they get a "data processor is locked" error because they tried
> to change some state.
> 
> 
> On 3/25/20 11:50 AM, Beckerle, Mike wrote:
>> Here's my concrete proposal to fix DataProcesor's API.
>>
>> We add a clone() method. It copies the object sharing the big/expensive 
>> stuff like the compiled schema but gives you a separate copy of all the 
>> state that is settable via setValidationMode, setExternalDFDLVariable, and 
>> setTunable.
>>
>> So you get a DataProcessor either from a ProcessorFactory, or from 
>> Compiler.reload().
>>
>> You set state.
>> You can call parse/unparse on various threads
>>
>> If you want to change state settings for another parse call, you must call 
>> clone() to get another instance of the DataProcessor. This will share the 
>> expensive/big stuff like the compiled schema, but give you a separate copy 
>> of all the state that is settable via setValidationMode, 
>> setExternalDFDLVariable, and setTunable.
>>
>> Then you set state
>> You can call parse/unparse on various threads.
>>
>> I hate to get into locking and such, but once you call parse/unparse, that 
>> should lock the object state, and any further setter calls should 
>> fail/throw. You should have to clone it to "change" the settings.
>>
>> This makes the use of the setters a java-ish way of achieving the same 
>> reentrancy one would get if there were no setters and simply more arguments 
>> to the parse/unparse calls.
>>
>> ________________________________
>> From: Beckerle, Mike <mbecke...@tresys.com>
>> Sent: Wednesday, March 25, 2020 11:29 AM
>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>
>> To respond to my own thread here, I think given that we allow multiple 
>> simultaneous calls to parse/unparse from different threads, we must say that 
>> the DataProcessor object is immutable once parse or unparse are called.
>>
>> I suppose we could say that it is mutable, but behavior is undetermined if 
>> any parse or unparse calls are active on any thread.
>>
>> But this is just asking for trouble IMHO.
>>
>> I think we started out with a stateful, non-thread capable API. The idea is 
>> that one thread would be invoking a data processor at a time. A data 
>> procesor was the state-block of an execution.
>>
>> The need to share compiled processor reloads, because the compile schemas 
>> were expensive to create, tempted us to allow multiple parse/unparse calls 
>> on different threads.
>>
>> Fact is, I think we should have said no to this, provided a 
>> DataProcessor.clone() to create instances that shared the reloaded compiled 
>> schema binary, but otherwise had separate state, and said that parse/unparse 
>> were synchronized methods on their DP instance.
>>
>> Instead we're in a 1/2 way world where we don't have a thread-reasonable API 
>> due to mutable state in what turns out to be a cross-thread shared object.
>>
>> ________________________________
>> From: Beckerle, Mike <mbecke...@tresys.com>
>> Sent: Wednesday, March 25, 2020 10:55 AM
>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>> Subject: Compiler.setExternalDFDLVariable(s) considered challenged
>>
>> Why does the API for Daffodil have
>>
>> Compiler.setExternalDFDLVariable(...)
>> and
>> Compiler.setExternalDFDLVariables(...)
>>
>> on it.
>>
>> I believe we should deprecate this.
>>
>> Compilers are parameterized by some of the tunables I understand.
>>
>> But the external DFDL variables? These cannot affect compilation. The schema 
>> compiler needs to know statically the information about variables found in 
>> the schema itself in the dfdl:defineVariable statement annotations.
>>
>> But the compiler doesn't need external variable bindings. In fact if it did 
>> know and use them, it would be building assumptions into the compiled schema 
>> that it shouldn't be building in.
>>
>> Setting external var bindings on the Compiler just causes problems when that 
>> compiler instance is reused in a context where those settings aren't 
>> appropriate. (JIRA DAFFODIL-2302 is one such problem)
>>
>> I believe setExternalDFDLVariable(s) methods should be deprecated, and 
>> external variables bindings should be an optional argument to the 
>> parse/unparse methods.
>>
>> The setters cause thread safety issues because the DP is stateful, even 
>> though we want multiple calls to parse/unparse to be executable on different 
>> threads.
>>
>> Consider: if we allow ordinary setExternalDFDLVariables and add a 
>> resetExternalDFDLVariables to clear them, then imagine one wants to make two 
>> parse calls on separate threads with different external variables bindings:
>>
>> so on main thread.....
>>      dp.setExternalVariables(...bindings 1...)
>>      spawn the thread 1
>> on the thread 1
>>      dp.parse(....)
>> back on main thread
>>      dp.resetExternalVariables() // race condition. Did the parse call read 
>> the external variables before this reset or not?
>>      dp.setExternalVariables(...bindings 2....)
>>     .....
>>
>> However, if we make the external variable bindings an argument to parse, we 
>> avoid all of this.
>>
>> Alternatively, since DataProcessor has setExternalDFDLVariable, we can 
>> prohibit multiple calls on the same DataProcessor object simultaneously. We 
>> can provide a clone() method that preserves the loaded/reloaded processor, 
>> but constructs another DataProcessor object, thereby allowing separate 
>> external variables state per DataProcessor instance.
>>
>>
>> Comments?
>>
>>
>>
>>
>>
>> Mike Beckerle | Principal Engineer
>> [Owl Cyber Defense]<http://owlcyberdefense.com>
>> [cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of 
>> Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/>
>> P +1-781-330-0412
>> W owlcyberdefense.com<http://www.owlcyberdefense.com>
>>
> 
> 

Reply via email to