Hi Andy, Sorry for the delayed response.
I like the idea of moving the capabilities out of core so that we can decouple components for a variety of reasons that boil down to minimizing work to break apart components in the future. I'm certainly not against augmenting core to facilitate functions like EL, but in my opinion, we should be able to do so without directly coupling the core with the concept of expression language. My hope would be that an injected component ( this may necessitate a more complex framework like google fruit ) can help inject an expression language specific lambda. I think this mostly goes inline with your idea that we have a way to plug in an arbitrary evaluation mechanism, but instead, I'm saying we should inject a function ( in a controlled way, through a custom configuration ) to execute the evaluation. Adding a function argument to getProperty that includes the argument for FlowFile indicates that the processor developer wants to get the property in relation to the FlowFile. This shouldn't require a separate type of property but rather a lambda that was provided by the extension to evaluate the property in relation to the FlowFile's attributes/and or content. If the expression language is disabled, replaced by another, or the property is not an expression this does nothing other return the property value. In my mind you would never have the FlowFile in onSchedule and if you pass it with your getProperty, the developer naturally wants any evaluation to occur if that property is an evaluation ). If they don't want a property to ever evaluate, they needn't pass the FlowFile pointer. What you mention for cons vary wildly on compiler version, processor capabilities, etc. Branch prediction, code cache misses, virtual function inlining and caches are topics better served for a discussion on optimization not design. The code in [1] produces assembly output by where we inline the virtual function without optimization( -O0)[2], even on a portable arch setting in GCC; however, the same code produces a different assembler output on gcc 4.5.3 with optimizations enabled [3]. If I run the optimized code, the vtable segment is approximately two nanoseconds faster than the inlined code because it results in smaller code for better branch prediction and instruction cache hits. Perf stat shows this across 1M runs, and in a more illustrative example, we can see how jumps aren't always slower than inlining. A great example of this exists in last year's CPPCON from Chandler Carruth. My point is not to argue, but rather to suggest that cons such as this have such wildly varying characteristics it's akin to suggesting anyone really knows what the Java Garbage collector is going to do at any given point [4]. Nonetheless I'm suggesting we only inject and cache a function that's run to evaluate. This certainly may require an extended FlowConfiguration and changes to Property, ProcessorNode, and ProcessorContext, but the changes should hopefully minimize EL specific code on the core so that if/when we move certain parts to C for 'librarification' we can do so more easily. Additionally, if there is ever the need to completely disable EL in production, we can do so by changing the injected classes within the binaries via C2. Of course, these are only suggestions. I would eventually like to see someone who is using libminifi as a library be able to inject this in the same way, but not be required to include it if they have only the need to do a very small task. For the same reason I've been slow to respond, I unfortunately responded to this very quickly so if I didn't make any sense please let me know [5]. [1]https://gist.github.com/phrocker/100a0806fbc188731358b659c7cf63ab [2] leaq -48(%rbp), %rax addq $8, %rax movl $.LC0, %esi movq %rax, %rdi call std::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator=(char const*) leaq -48(%rbp), %rax addq $8, %rax movq %rax, %rdx leaq -64(%rbp), %rax movq %rdx, %rsi movq %rax, %rdi call std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) leaq -64(%rbp), %rax movq %rax, %rdi call std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() [3] subq $64, %rsp leaq 16(%rsp), %rbx movq vtable for Property2+16, 16(%rsp) [4] haha. [5] I usually don't. On Thu, Nov 9, 2017 at 11:06 AM, Andy Christianson <aichr...@protonmail.com> wrote: > MiNiFi - C++ Devs, > > I am currently working on MINIFICPP-49, the expression language feature. > While expression compilation and evaluation is fairly self-contained, at > the very least the API to access expression evaluation will touch core > components. > > Here is how NiFi is currently exposing expression evaluation to processors: > > ... > try { > // read the url property from the context > final String urlstr = trimToEmpty(context.getPropert > y(PROP_URL).evaluateAttributeExpressions(requestFlowFile).getValue()); > final URL url = new URL(urlstr); > ... > > While we have the opportunity now to improve this, we have a couple design > constraints: the expression code comes from properties, and dynamic > evaluation of it requires a flow file as input. > > Because expressions are defined as processor properties, it is natural to > expose expression evaluation via the ProcessContext API. The current > minifi-cpp API to get static properties is as follows: > > bool getProperty(const std::string &name, std::string &value) { > return processor_node_->getProperty(name, value); > } > > If we do not wish to introduce a Property type with its own > evaluateAttributeExpressions method, we could simply introduce another > ProcessContext method for evaluating dynamic properties: > > bool evaluateProperty(const std::string &name, const core::FlowFile > &flow_file, std::string &value) { > ... > } > > The implementation of this would compile the expression (the raw value as > returned by getProperty(...)) if it has not yet been compiled, then > evaluate the compiled expression against the provided FlowFile. The end > result is an API similar to, but simpler than, the NiFi interface. The > alternative is to provide the expression primitives to processors and allow > them to manage compilation/evaluation on their own. This would increase > complexity across all processors which support expression properties, which > will likely be most processors. > > The next important question which impacts core minifi is whether or not > expression language should be an extension. Whether or not it is an > extension, some kind of standard interface to expressions will need to be > made available to all processors. Here are the pros/cons of putting it in > an extension, as far as I can tell: > > Pros: > > - Reduce compiled size of minifi somewhat (the lexer/parser is currently > ~4300 lines of C++ with no additional library or runtime dependencies) when > feature is disabled > - Allow for alternate expression language implementations in the future > > Cons: > > - Additional complexity by needing to add Expression primitives, a > standard Expression compiler API, dynamic object loading, and an empty > (NoOp) implementation if the extension is not included > - Additional vtable lookups on an operation which will be invoked very > frequently (every property lookup on every flow file which supports > expressions) > - Makes it harder for gcc/clang/etc. to inline/optimize expression > language functions > - Core processors (e.g. GetFile/PutFile, where expression language will > almost certainly be desired for file paths and other properties) will > depend on an optional extension > > I would like to hear feedback from the dev community on these two > important topics (the interface to the expression language and whether or > not the implementation should be an extension) before writing the code that > touches core components. The API question is ultimately more important > because it touches all current and future processor authors. The decision > of whether it is an extension or not is more reversible. > > Regards, > > Andy I.C. >