Sounds good to me. +1

On 3/26/20 11:52 AM, Beckerle, Mike wrote:
> You definitely cannot share a dp across threads this way.
> 
> I propose just deprecating the setExternalVariable API (along with the other 
> setters) and that pattern generally, and a new more functional style API  for 
> establishing tunables, external variables, and validation modes, that will 
> keep us out of trouble.
> 
> 
> ________________________________
> From: Steve Lawrence <slawre...@apache.org>
> Sent: Thursday, March 26, 2020 7:39 AM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
> 
> Even if we did add the locking, it's still possible to use these
> incorrectly, right? For example, say we had two threads like this:
> 
>   Thread {
>     dp.setExternalVariable("foo", 1)
>     dp.parse(foo1)
>   }
> 
>   Thread {
>     dp.setExternalVariable("foo", 2)
>     dp.parse(foo2)
>   }
> 
> The order of evaluation could be:
> 
>   dp.setExternalVariable("foo", 1)
>   dp.setExternalVariable("foo", 2)
>   dp.parse(foo1)
>   dp.parse(foo2)
> 
> So foo1 is parsed with foo=2. Even if we add locking to Daffodil, the
> use of setExternalVariable within threads can cause unexpected behavior.
> Locking needs to happen outside of Daffodil's control to make it work as
> expected. And if someone adds their own locking around everything, then
> our internal locking isn't necessary.
> 
> It feels to me like setExternalVariable is just fundamentally broken,
> and so trying to fix it is just going to add complication that still
> doesn't solve the fundamental problems.
> 
> 
> On 3/25/20 3:50 PM, Beckerle, Mike wrote:
>> It is possible to modify in a way that is 100% compatible with existing 
>> "correct" practice, which is to lock and unlock the object. Lock the state 
>> when any parse/unparse call is in flight, unlock when none are in flight.
>>
>> setters can then block waiting for the lock to be freed.
>>
>> That's guaranteed to be compatible with any correct current practice.
>>
>> I just hate to put all that complexity in there.
>> ________________________________
>> From: Steve Lawrence <stephen.d.lawre...@gmail.com>
>> Sent: Wednesday, March 25, 2020 3:44 PM
>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>
>> My one minor concern about the locking after parse/unparse is that it
>> changes behavior, and could potentially break things (I assume use after
>> lock is an assertion of some kind?).
>>
>> It's definitely possible to use the DataProcessor in a safe manner while
>> mutating the state currently if someone is careful, so to not allow
>> mutating it after calling parse/unparse could potentially break someones
>> existing use of Daffodil.
>>
>> What are your thoughts on @deprecating those functions, the deprecation
>> message can be something like "This is not thread-safe, be very careful
>> if you do not use it". And rather than locking and asserting, we could
>> just create a warning and say "You're using this in a potentially
>> dangerous manner, you should use the new API"?
>>
>> - Steve
>>
>> On 3/25/20 3:35 PM, Beckerle, Mike wrote:
>>> A fully functional API is probably good to have, and I'd like to define 
>>> that also and cut over to using that within Daffodil.
>>>
>>> We would need a dataProcessor.reset() method that creates a new dp which 
>>> has had all external vars, tunables, basically any of those formerly 
>>> settable things, forgotten so you can start clean.
>>>
>>> (Would clean() be a better name than reset() ?)
>>>
>>> This reset() subsumes my clone() method idea.
>>>
>>> I want to deprecate (literally with "@deprecated" annotations) the setters 
>>> generally, but my compromise is once you parse/unparse the object becomes 
>>> immutable permanently. So the setters are just a different way to 
>>> parameterize the object.
>>>
>>> The other thing that is impacted by this change it the 
>>> OpenDFDL/ibmCrossTester that implements these same APIs but drives IBM DFDL 
>>> with them.  That's ok for now. I can retrofit that later. It will still 
>>> work, just get deprecation warnings.
>>>
>>> ________________________________
>>> From: Steve Lawrence <stephen.d.lawre...@gmail.com>
>>> Sent: Wednesday, March 25, 2020 12:05 PM
>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>>
>>> This feels like the right approach. Thoughts on instead of having to
>>> manually clone() and having some sort of locking mechanism when parse is
>>> called, we just make a DataProcessor completely immutable, and setting a
>>> variable will implicitly create a copy with modified state, something like:
>>>
>>> val dp = dataProcessor
>>>            .withTunable("tunableFoo", "1")
>>>            .withVariable("variableFoo", 2")
>>>            .withVariable("variableBar", 2")
>>>
>>> It's not as java-y, but feels safer. I can imagine a case where someone
>>> sets a bunch of state, calls parse and locks things, and then way down
>>> the line they get a "data processor is locked" error because they tried
>>> to change some state.
>>>
>>>
>>> On 3/25/20 11:50 AM, Beckerle, Mike wrote:
>>>> Here's my concrete proposal to fix DataProcesor's API.
>>>>
>>>> We add a clone() method. It copies the object sharing the big/expensive 
>>>> stuff like the compiled schema but gives you a separate copy of all the 
>>>> state that is settable via setValidationMode, setExternalDFDLVariable, and 
>>>> setTunable.
>>>>
>>>> So you get a DataProcessor either from a ProcessorFactory, or from 
>>>> Compiler.reload().
>>>>
>>>> You set state.
>>>> You can call parse/unparse on various threads
>>>>
>>>> If you want to change state settings for another parse call, you must call 
>>>> clone() to get another instance of the DataProcessor. This will share the 
>>>> expensive/big stuff like the compiled schema, but give you a separate copy 
>>>> of all the state that is settable via setValidationMode, 
>>>> setExternalDFDLVariable, and setTunable.
>>>>
>>>> Then you set state
>>>> You can call parse/unparse on various threads.
>>>>
>>>> I hate to get into locking and such, but once you call parse/unparse, that 
>>>> should lock the object state, and any further setter calls should 
>>>> fail/throw. You should have to clone it to "change" the settings.
>>>>
>>>> This makes the use of the setters a java-ish way of achieving the same 
>>>> reentrancy one would get if there were no setters and simply more 
>>>> arguments to the parse/unparse calls.
>>>>
>>>> ________________________________
>>>> From: Beckerle, Mike <mbecke...@tresys.com>
>>>> Sent: Wednesday, March 25, 2020 11:29 AM
>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>>>
>>>> To respond to my own thread here, I think given that we allow multiple 
>>>> simultaneous calls to parse/unparse from different threads, we must say 
>>>> that the DataProcessor object is immutable once parse or unparse are 
>>>> called.
>>>>
>>>> I suppose we could say that it is mutable, but behavior is undetermined if 
>>>> any parse or unparse calls are active on any thread.
>>>>
>>>> But this is just asking for trouble IMHO.
>>>>
>>>> I think we started out with a stateful, non-thread capable API. The idea 
>>>> is that one thread would be invoking a data processor at a time. A data 
>>>> procesor was the state-block of an execution.
>>>>
>>>> The need to share compiled processor reloads, because the compile schemas 
>>>> were expensive to create, tempted us to allow multiple parse/unparse calls 
>>>> on different threads.
>>>>
>>>> Fact is, I think we should have said no to this, provided a 
>>>> DataProcessor.clone() to create instances that shared the reloaded 
>>>> compiled schema binary, but otherwise had separate state, and said that 
>>>> parse/unparse were synchronized methods on their DP instance.
>>>>
>>>> Instead we're in a 1/2 way world where we don't have a thread-reasonable 
>>>> API due to mutable state in what turns out to be a cross-thread shared 
>>>> object.
>>>>
>>>> ________________________________
>>>> From: Beckerle, Mike <mbecke...@tresys.com>
>>>> Sent: Wednesday, March 25, 2020 10:55 AM
>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>>> Subject: Compiler.setExternalDFDLVariable(s) considered challenged
>>>>
>>>> Why does the API for Daffodil have
>>>>
>>>> Compiler.setExternalDFDLVariable(...)
>>>> and
>>>> Compiler.setExternalDFDLVariables(...)
>>>>
>>>> on it.
>>>>
>>>> I believe we should deprecate this.
>>>>
>>>> Compilers are parameterized by some of the tunables I understand.
>>>>
>>>> But the external DFDL variables? These cannot affect compilation. The 
>>>> schema compiler needs to know statically the information about variables 
>>>> found in the schema itself in the dfdl:defineVariable statement 
>>>> annotations.
>>>>
>>>> But the compiler doesn't need external variable bindings. In fact if it 
>>>> did know and use them, it would be building assumptions into the compiled 
>>>> schema that it shouldn't be building in.
>>>>
>>>> Setting external var bindings on the Compiler just causes problems when 
>>>> that compiler instance is reused in a context where those settings aren't 
>>>> appropriate. (JIRA DAFFODIL-2302 is one such problem)
>>>>
>>>> I believe setExternalDFDLVariable(s) methods should be deprecated, and 
>>>> external variables bindings should be an optional argument to the 
>>>> parse/unparse methods.
>>>>
>>>> The setters cause thread safety issues because the DP is stateful, even 
>>>> though we want multiple calls to parse/unparse to be executable on 
>>>> different threads.
>>>>
>>>> Consider: if we allow ordinary setExternalDFDLVariables and add a 
>>>> resetExternalDFDLVariables to clear them, then imagine one wants to make 
>>>> two parse calls on separate threads with different external variables 
>>>> bindings:
>>>>
>>>> so on main thread.....
>>>>      dp.setExternalVariables(...bindings 1...)
>>>>      spawn the thread 1
>>>> on the thread 1
>>>>      dp.parse(....)
>>>> back on main thread
>>>>      dp.resetExternalVariables() // race condition. Did the parse call 
>>>> read the external variables before this reset or not?
>>>>      dp.setExternalVariables(...bindings 2....)
>>>>     .....
>>>>
>>>> However, if we make the external variable bindings an argument to parse, 
>>>> we avoid all of this.
>>>>
>>>> Alternatively, since DataProcessor has setExternalDFDLVariable, we can 
>>>> prohibit multiple calls on the same DataProcessor object simultaneously. 
>>>> We can provide a clone() method that preserves the loaded/reloaded 
>>>> processor, but constructs another DataProcessor object, thereby allowing 
>>>> separate external variables state per DataProcessor instance.
>>>>
>>>>
>>>> Comments?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Mike Beckerle | Principal Engineer
>>>> [Owl Cyber Defense]<http://owlcyberdefense.com>
>>>> [cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of 
>>>> Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/>
>>>> P +1-781-330-0412
>>>> W owlcyberdefense.com<http://www.owlcyberdefense.com>
>>>>
>>>
>>>
>>
>>
> 
> 

Reply via email to