Personally I suspect that temporary variable is a different thing as is the assignment PR. Might be useful for intermediate steps in a parser, but then we’re potentially getting more complex than a parser wants to be. I am warming to the idea of temporary variables though.
In terms of the removal, I like the idea of the COMPLETE transformation to express a projection. That makes the output interface of the metron object more explicit in a parser, which makes governance much easier. Do we think this is a good consensus? Shall I ticket it (I might even code it!) in the transformation form proposed? Simon > On 4 Dec 2017, at 17:21, Casey Stella <ceste...@gmail.com> wrote: > > So, just chiming in here. It seems to me that we have a problem with > extraneous fields in a couple of different ways: > > * Temporary Variables > > I think that the problem of temporary variables is one beyond just the > parser. What I'd like to see is the Stellar field transformations operate > similar to the enrichment field transformations in that they are no longer > a map (this is useful beyond this case for having multiple assignments for > a variable) and having a special assignment indicator which would indicate > a temporary variable (e.g. ^= instead of :=). This would clean up some of > the usecases in enrichments as well. Combine this with the assumption that > all non-temporary fields are included in output for the field > transformation if it is not specified and I think we have something that is > sensible and somewhat backwards compatible. To wit: > { > "fieldTransformations": [ > { > "transformation": "STELLAR", > "config": [ > "ipSrc ^= TRIM(raw_ip_src)" > "ip_src_addr := ipSrc" > ] > } > ] > } > > * Extraneous Fields from the Parser > > For these, we do currently have a REMOVE field transformation, but I'd be > ok with a PROJECT or COMPLETE field transformation to provide a whitelist. > That might look like: > { > "fieldTransformations": [ > { > "transformation": "STELLAR", > "config": [ > "ipSrc ^= TRIM(raw_ip_src)" > "ip_src_addr := ipSrc" > ] > }, > { > "transformation": "COMPLETE", > "output" : [ "ip_src_addr", "ip_dst_addr", "message"] > } > ] > } > > I think having these two treated separately makes sense because sometimes > you will want COMPLETE and sometimes not. Also, this fits within the core > abstraction that we already have. > > On Thu, Nov 30, 2017 at 8:21 PM, Simon Elliston Ball < > si...@simonellistonball.com <mailto:si...@simonellistonball.com>> wrote: > >> Hmmm… Actually, I kinda like that. >> >> May want a little refactoring in the back for clarity. >> >> My question about whether we could ever imagine this ‘cleanup policy’ >> applying to other transforms would sway me to the field rather than >> transformation name approach though. >> >> Simon >> >>> On 1 Dec 2017, at 01:17, Otto Fowler <ottobackwa...@gmail.com> wrote: >>> >>> Or, we can create new transformation types >>> STELLAR_COMPLETE, which may be more in line with the original design. >>> >>> >>> >>> On November 30, 2017 at 20:14:46, Otto Fowler (ottobackwa...@gmail.com >> <mailto:ottobackwa...@gmail.com <mailto:ottobackwa...@gmail.com>>) wrote: >>> >>>> I would suggest that instead of explicitly having “complete”, we have >> “operation”:”complete” >>>> >>>> Such that we can have multiple transformations, each with a different >> “operation”. >>>> No operation would be the status quo ante, if we can do it so that we >> don’t get errors with old configs and the keep same behavior. >>>> >>>> { >>>> "fieldTransformations": [ >>>> { >>>> "transformation": "STELLAR", >>>> “operation": “complete", >>>> "output": ["ip_src_addr", "ip_dst_addr"], >>>> "config": { >>>> "ip_src_addr": "ipSrc", >>>> "ip_dest_addr": "ipDst" >>>> } , >>>> { >>>> "transformation": "STELLAR", >>>> “operation": “SomeOtherThing", >>>> "output": [“foo", “bar"], >>>> "config": { >>>> “foo": “TO_UPPER(foo)", >>>> “bar": “TO_LOWER(bar)" >>>> } >>>> } >>>> ] >>>> } >>>> >>>> >>>> Sorry for the junk examples, but hopefully it makes sense. >>>> >>>> >>>> >>>> >>>> >>>> On November 30, 2017 at 20:00:06, Simon Elliston Ball ( >> si...@simonellistonball.com <mailto:si...@simonellistonball.com> >> <mailto:si...@simonellistonball.com <mailto:si...@simonellistonball.com>>) >> wrote: >>>> >>>>> I’m looking at the way parser config works, and transformation of >> field from their native names in, for example the ASA or CEF parsers, into >> a standard data model. >>>>> >>>>> At the moment I would do something like this: >>>>> >>>>> assuming I have fields [ipSrc, ipDst, pointlessExtraStuff, message] I >> might have: >>>>> >>>>> { >>>>> "fieldTransformations": [ >>>>> { >>>>> "transformation": "STELLAR", >>>>> "output": ["ip_src_addr", "ip_dst_addr", "message"], >>>>> "config": { >>>>> "ip_src_addr": "ipSrc", >>>>> "ip_dest_addr": "ipDst" >>>>> } >>>>> } >>>>> ] >>>>> } >>>>> >>>>> which leave me with the field set: >>>>> [ipSrc, ipDst, pointlessExtraStuff, message, ip_src_addr, ip_dest_addr] >>>>> >>>>> unless I go with:- >>>>> >>>>> { >>>>> "fieldTransformations": [ >>>>> { >>>>> "transformation": "STELLAR", >>>>> "output": ["ip_src_addr", "ip_dst_addr", "message"], >>>>> "config": { >>>>> "ip_src_addr": "ipSrc", >>>>> "ip_dest_addr": "ipDst", >>>>> "pointlessExtraStuff": null, >>>>> "ipSrc": null, >>>>> "ipDst": null >>>>> } >>>>> } >>>>> ] >>>>> } >>>>> >>>>> which seems a little over verbose. >>>>> >>>>> Do you think it would be valuable to add a switch of some sort on the >> transformation to make it “complete”, i.e. to only preserve fields which >> are explicitly set. >>>>> >>>>> To my mind, this breaks a principal of mutability, but gives us much >> much cleaner mapping of data. >>>>> >>>>> I would propose something like: >>>>> >>>>> { >>>>> "fieldTransformations": [ >>>>> { >>>>> "transformation": "STELLAR", >>>>> "complete": true, >>>>> "output": ["ip_src_addr", "ip_dst_addr", "message"], >>>>> "config": { >>>>> "ip_src_addr": "ipSrc", >>>>> "ip_dest_addr": "ipDst" >>>>> } >>>>> } >>>>> ] >>>>> } >>>>> >>>>> which would give me the set ["ip_src_addr", "ip_dst_addr", "message”] >> effectively making the nulling in my previous example implicit. >>>>> >>>>> Thoughts? >>>>> >>>>> Also, in the second scenario, if ‘output' were to be empty would we >> assume that the output field set should be ["ip_src_addr", “ip_dst_addr”]? >>>>> >>>>> Simon