We have developed a NiFi processor that uses XMLCalabash [1] to add support for XProc [2] processing. XProc is an XML transformation language that defines and XML pipeline, allowing for complex validation, transformation, and routing of XML data within the pipeline, using existing XML technologies such as RelaxNG, Schematron, XSD Schema, XQuery, XSLT, XPath and custom XProc transformations.
This new processor is mostly straightforward, but we had some questions regarding the specific implementation and the handling of non-thread safe code. The code is available for viewing here: https://opensource.ncsa.illinois.edu/bitbucket/projects/DFDL/repos/nifi-xproc/browse In this processor, a property is created to provide an XProc file, which defines the pipeline input and output "ports". XML goes into an input port, goes through the pipeline, and one or more XML documents exit at specified output ports. This NiFi processor maps each output port to a dynamic NiFi relationship. It does this mapping in the onPropertyModified method when the XProc file property is changed. This method also stores the XMLCalabash XRuntime and XPipeline objects (which do all the pipeline work) in volatile member variables to be used later in onTrigger. The members are saved here to avoid recreating them in each call to onTrigger. Is this an acceptable place to do that? It seems this normally happens in an @OnScheduled method or in the first call to onTrigger, however the objects must be created in onPropertyModified to get the output ports, so this does avoid recreating the same objects multiple times. Also note that the same objects are created in the XML_PIPELINE_VALIDATOR but are not saved due to the validator being static, so there is already some duplication. Is there a standard way to avoid duplication/is this an acceptable way to handle this? The other concern we have is that the XPipeline and XRuntime objects created by XML Calabash are not thread safe. To resolve this issue, the processor is annotated with @TriggerSerially. Is this the correct solution, or is there a some other preferred method. Perhaps ThreadLocal or a thread safe pool of XPipeline objects is preferred? Lastly, is this something the devs would be interested in pulling into NiFI, and if not, what could be changed to achieve this? The code is licensed as Apache v2 and we would be happy to contribute the code to NiFi if deemed acceptable. Thanks, - Steve [1] http://xmlcalabash.com/ [2] https://www.w3.org/TR/xproc/
