To respond to my own thread here, I think given that we allow multiple simultaneous calls to parse/unparse from different threads, we must say that the DataProcessor object is immutable once parse or unparse are called.
I suppose we could say that it is mutable, but behavior is undetermined if any parse or unparse calls are active on any thread. But this is just asking for trouble IMHO. I think we started out with a stateful, non-thread capable API. The idea is that one thread would be invoking a data processor at a time. A data procesor was the state-block of an execution. The need to share compiled processor reloads, because the compile schemas were expensive to create, tempted us to allow multiple parse/unparse calls on different threads. Fact is, I think we should have said no to this, provided a DataProcessor.clone() to create instances that shared the reloaded compiled schema binary, but otherwise had separate state, and said that parse/unparse were synchronized methods on their DP instance. Instead we're in a 1/2 way world where we don't have a thread-reasonable API due to mutable state in what turns out to be a cross-thread shared object. ________________________________ From: Beckerle, Mike <mbecke...@tresys.com> Sent: Wednesday, March 25, 2020 10:55 AM To: dev@daffodil.apache.org <dev@daffodil.apache.org> Subject: Compiler.setExternalDFDLVariable(s) considered challenged Why does the API for Daffodil have Compiler.setExternalDFDLVariable(...) and Compiler.setExternalDFDLVariables(...) on it. I believe we should deprecate this. Compilers are parameterized by some of the tunables I understand. But the external DFDL variables? These cannot affect compilation. The schema compiler needs to know statically the information about variables found in the schema itself in the dfdl:defineVariable statement annotations. But the compiler doesn't need external variable bindings. In fact if it did know and use them, it would be building assumptions into the compiled schema that it shouldn't be building in. Setting external var bindings on the Compiler just causes problems when that compiler instance is reused in a context where those settings aren't appropriate. (JIRA DAFFODIL-2302 is one such problem) I believe setExternalDFDLVariable(s) methods should be deprecated, and external variables bindings should be an optional argument to the parse/unparse methods. The setters cause thread safety issues because the DP is stateful, even though we want multiple calls to parse/unparse to be executable on different threads. Consider: if we allow ordinary setExternalDFDLVariables and add a resetExternalDFDLVariables to clear them, then imagine one wants to make two parse calls on separate threads with different external variables bindings: so on main thread..... dp.setExternalVariables(...bindings 1...) spawn the thread 1 on the thread 1 dp.parse(....) back on main thread dp.resetExternalVariables() // race condition. Did the parse call read the external variables before this reset or not? dp.setExternalVariables(...bindings 2....) ..... However, if we make the external variable bindings an argument to parse, we avoid all of this. Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit multiple calls on the same DataProcessor object simultaneously. We can provide a clone() method that preserves the loaded/reloaded processor, but constructs another DataProcessor object, thereby allowing separate external variables state per DataProcessor instance. Comments? Mike Beckerle | Principal Engineer [Owl Cyber Defense]<http://owlcyberdefense.com> [cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/> P +1-781-330-0412 W owlcyberdefense.com<http://www.owlcyberdefense.com>