My one minor concern about the locking after parse/unparse is that it changes behavior, and could potentially break things (I assume use after lock is an assertion of some kind?).
It's definitely possible to use the DataProcessor in a safe manner while mutating the state currently if someone is careful, so to not allow mutating it after calling parse/unparse could potentially break someones existing use of Daffodil. What are your thoughts on @deprecating those functions, the deprecation message can be something like "This is not thread-safe, be very careful if you do not use it". And rather than locking and asserting, we could just create a warning and say "You're using this in a potentially dangerous manner, you should use the new API"? - Steve On 3/25/20 3:35 PM, Beckerle, Mike wrote: > A fully functional API is probably good to have, and I'd like to define that > also and cut over to using that within Daffodil. > > We would need a dataProcessor.reset() method that creates a new dp which has > had all external vars, tunables, basically any of those formerly settable > things, forgotten so you can start clean. > > (Would clean() be a better name than reset() ?) > > This reset() subsumes my clone() method idea. > > I want to deprecate (literally with "@deprecated" annotations) the setters > generally, but my compromise is once you parse/unparse the object becomes > immutable permanently. So the setters are just a different way to > parameterize the object. > > The other thing that is impacted by this change it the > OpenDFDL/ibmCrossTester that implements these same APIs but drives IBM DFDL > with them. That's ok for now. I can retrofit that later. It will still work, > just get deprecation warnings. > > ________________________________ > From: Steve Lawrence <stephen.d.lawre...@gmail.com> > Sent: Wednesday, March 25, 2020 12:05 PM > To: dev@daffodil.apache.org <dev@daffodil.apache.org> > Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged > > This feels like the right approach. Thoughts on instead of having to > manually clone() and having some sort of locking mechanism when parse is > called, we just make a DataProcessor completely immutable, and setting a > variable will implicitly create a copy with modified state, something like: > > val dp = dataProcessor > .withTunable("tunableFoo", "1") > .withVariable("variableFoo", 2") > .withVariable("variableBar", 2") > > It's not as java-y, but feels safer. I can imagine a case where someone > sets a bunch of state, calls parse and locks things, and then way down > the line they get a "data processor is locked" error because they tried > to change some state. > > > On 3/25/20 11:50 AM, Beckerle, Mike wrote: >> Here's my concrete proposal to fix DataProcesor's API. >> >> We add a clone() method. It copies the object sharing the big/expensive >> stuff like the compiled schema but gives you a separate copy of all the >> state that is settable via setValidationMode, setExternalDFDLVariable, and >> setTunable. >> >> So you get a DataProcessor either from a ProcessorFactory, or from >> Compiler.reload(). >> >> You set state. >> You can call parse/unparse on various threads >> >> If you want to change state settings for another parse call, you must call >> clone() to get another instance of the DataProcessor. This will share the >> expensive/big stuff like the compiled schema, but give you a separate copy >> of all the state that is settable via setValidationMode, >> setExternalDFDLVariable, and setTunable. >> >> Then you set state >> You can call parse/unparse on various threads. >> >> I hate to get into locking and such, but once you call parse/unparse, that >> should lock the object state, and any further setter calls should >> fail/throw. You should have to clone it to "change" the settings. >> >> This makes the use of the setters a java-ish way of achieving the same >> reentrancy one would get if there were no setters and simply more arguments >> to the parse/unparse calls. >> >> ________________________________ >> From: Beckerle, Mike <mbecke...@tresys.com> >> Sent: Wednesday, March 25, 2020 11:29 AM >> To: dev@daffodil.apache.org <dev@daffodil.apache.org> >> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged >> >> To respond to my own thread here, I think given that we allow multiple >> simultaneous calls to parse/unparse from different threads, we must say that >> the DataProcessor object is immutable once parse or unparse are called. >> >> I suppose we could say that it is mutable, but behavior is undetermined if >> any parse or unparse calls are active on any thread. >> >> But this is just asking for trouble IMHO. >> >> I think we started out with a stateful, non-thread capable API. The idea is >> that one thread would be invoking a data processor at a time. A data >> procesor was the state-block of an execution. >> >> The need to share compiled processor reloads, because the compile schemas >> were expensive to create, tempted us to allow multiple parse/unparse calls >> on different threads. >> >> Fact is, I think we should have said no to this, provided a >> DataProcessor.clone() to create instances that shared the reloaded compiled >> schema binary, but otherwise had separate state, and said that parse/unparse >> were synchronized methods on their DP instance. >> >> Instead we're in a 1/2 way world where we don't have a thread-reasonable API >> due to mutable state in what turns out to be a cross-thread shared object. >> >> ________________________________ >> From: Beckerle, Mike <mbecke...@tresys.com> >> Sent: Wednesday, March 25, 2020 10:55 AM >> To: dev@daffodil.apache.org <dev@daffodil.apache.org> >> Subject: Compiler.setExternalDFDLVariable(s) considered challenged >> >> Why does the API for Daffodil have >> >> Compiler.setExternalDFDLVariable(...) >> and >> Compiler.setExternalDFDLVariables(...) >> >> on it. >> >> I believe we should deprecate this. >> >> Compilers are parameterized by some of the tunables I understand. >> >> But the external DFDL variables? These cannot affect compilation. The schema >> compiler needs to know statically the information about variables found in >> the schema itself in the dfdl:defineVariable statement annotations. >> >> But the compiler doesn't need external variable bindings. In fact if it did >> know and use them, it would be building assumptions into the compiled schema >> that it shouldn't be building in. >> >> Setting external var bindings on the Compiler just causes problems when that >> compiler instance is reused in a context where those settings aren't >> appropriate. (JIRA DAFFODIL-2302 is one such problem) >> >> I believe setExternalDFDLVariable(s) methods should be deprecated, and >> external variables bindings should be an optional argument to the >> parse/unparse methods. >> >> The setters cause thread safety issues because the DP is stateful, even >> though we want multiple calls to parse/unparse to be executable on different >> threads. >> >> Consider: if we allow ordinary setExternalDFDLVariables and add a >> resetExternalDFDLVariables to clear them, then imagine one wants to make two >> parse calls on separate threads with different external variables bindings: >> >> so on main thread..... >> dp.setExternalVariables(...bindings 1...) >> spawn the thread 1 >> on the thread 1 >> dp.parse(....) >> back on main thread >> dp.resetExternalVariables() // race condition. Did the parse call read >> the external variables before this reset or not? >> dp.setExternalVariables(...bindings 2....) >> ..... >> >> However, if we make the external variable bindings an argument to parse, we >> avoid all of this. >> >> Alternatively, since DataProcessor has setExternalDFDLVariable, we can >> prohibit multiple calls on the same DataProcessor object simultaneously. We >> can provide a clone() method that preserves the loaded/reloaded processor, >> but constructs another DataProcessor object, thereby allowing separate >> external variables state per DataProcessor instance. >> >> >> Comments? >> >> >> >> >> >> Mike Beckerle | Principal Engineer >> [Owl Cyber Defense]<http://owlcyberdefense.com> >> [cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of >> Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/> >> P +1-781-330-0412 >> W owlcyberdefense.com<http://www.owlcyberdefense.com> >> > >