Personally I suspect that temporary variable is a different thing as is the 
assignment PR. Might be useful for intermediate steps in a parser, but then 
we’re potentially getting more complex than a parser wants to be. I am warming 
to the idea of temporary variables though. 

In terms of the removal, I like the idea of the COMPLETE transformation to 
express a projection. That makes the output interface of the metron object more 
explicit in a parser, which makes governance much easier. 

Do we think this is a good consensus? Shall I ticket it (I might even code it!) 
in the transformation form proposed? 

Simon

> On 4 Dec 2017, at 17:21, Casey Stella <ceste...@gmail.com> wrote:
> 
> So, just chiming in here.  It seems to me that we have a problem with
> extraneous fields in a couple of different ways:
> 
> * Temporary Variables
> 
> I think that the problem of temporary variables is one beyond just the
> parser.  What I'd like to see is the Stellar field transformations operate
> similar to the enrichment field transformations in that they are no longer
> a map (this is useful beyond this case for having multiple assignments for
> a variable) and having a special assignment indicator which would indicate
> a temporary variable (e.g. ^= instead of :=).  This would clean up some of
> the usecases in enrichments as well.  Combine this with the assumption that
> all non-temporary fields are included in output for the field
> transformation if it is not specified and I think we have something that is
> sensible and somewhat backwards compatible.  To wit:
> {
>  "fieldTransformations": [
>    {
>      "transformation": "STELLAR",
>      "config": [
>        "ipSrc ^= TRIM(raw_ip_src)"
>        "ip_src_addr := ipSrc"
>      ]
>    }
>  ]
> }
> 
> * Extraneous Fields from the Parser
> 
> For these, we do currently have a REMOVE field transformation, but I'd be
> ok with a PROJECT or COMPLETE field transformation to provide a whitelist.
> That might look like:
> {
>  "fieldTransformations": [
>    {
>      "transformation": "STELLAR",
>      "config": [
>        "ipSrc ^= TRIM(raw_ip_src)"
>        "ip_src_addr := ipSrc"
>      ]
>    },
>     {
>      "transformation": "COMPLETE",
>      "output" : [ "ip_src_addr", "ip_dst_addr", "message"]
>    }
>  ]
> }
> 
> I think having these two treated separately makes sense because sometimes
> you will want COMPLETE and sometimes not.  Also, this fits within the core
> abstraction that we already have.
> 
> On Thu, Nov 30, 2017 at 8:21 PM, Simon Elliston Ball <
> si...@simonellistonball.com <mailto:si...@simonellistonball.com>> wrote:
> 
>> Hmmm… Actually, I kinda like that.
>> 
>> May want a little refactoring in the back for clarity.
>> 
>> My question about whether we could ever imagine this ‘cleanup policy’
>> applying to other transforms would sway me to the field rather than
>> transformation name approach though.
>> 
>> Simon
>> 
>>> On 1 Dec 2017, at 01:17, Otto Fowler <ottobackwa...@gmail.com> wrote:
>>> 
>>> Or, we can create new transformation types
>>> STELLAR_COMPLETE, which may be more in line with the original design.
>>> 
>>> 
>>> 
>>> On November 30, 2017 at 20:14:46, Otto Fowler (ottobackwa...@gmail.com
>> <mailto:ottobackwa...@gmail.com <mailto:ottobackwa...@gmail.com>>) wrote:
>>> 
>>>> I would suggest that instead of explicitly having “complete”, we have
>> “operation”:”complete”
>>>> 
>>>> Such that we can have multiple transformations, each with a different
>> “operation”.
>>>> No operation would be the status quo ante, if we can do it so that we
>> don’t get errors with old configs and the keep same behavior.
>>>> 
>>>> {
>>>> "fieldTransformations": [
>>>> {
>>>> "transformation": "STELLAR",
>>>> “operation": “complete",
>>>> "output": ["ip_src_addr", "ip_dst_addr"],
>>>> "config": {
>>>> "ip_src_addr": "ipSrc",
>>>> "ip_dest_addr": "ipDst"
>>>> } ,
>>>> {
>>>> "transformation": "STELLAR",
>>>> “operation": “SomeOtherThing",
>>>> "output": [“foo", “bar"],
>>>> "config": {
>>>> “foo": “TO_UPPER(foo)",
>>>> “bar": “TO_LOWER(bar)"
>>>> }
>>>> }
>>>> ]
>>>> }
>>>> 
>>>> 
>>>> Sorry for the junk examples, but hopefully it makes sense.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On November 30, 2017 at 20:00:06, Simon Elliston Ball (
>> si...@simonellistonball.com <mailto:si...@simonellistonball.com> 
>> <mailto:si...@simonellistonball.com <mailto:si...@simonellistonball.com>>) 
>> wrote:
>>>> 
>>>>> I’m looking at the way parser config works, and transformation of
>> field from their native names in, for example the ASA or CEF parsers, into
>> a standard data model.
>>>>> 
>>>>> At the moment I would do something like this:
>>>>> 
>>>>> assuming I have fields [ipSrc, ipDst, pointlessExtraStuff, message] I
>> might have:
>>>>> 
>>>>> {
>>>>> "fieldTransformations": [
>>>>> {
>>>>> "transformation": "STELLAR",
>>>>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
>>>>> "config": {
>>>>> "ip_src_addr": "ipSrc",
>>>>> "ip_dest_addr": "ipDst"
>>>>> }
>>>>> }
>>>>> ]
>>>>> }
>>>>> 
>>>>> which leave me with the field set:
>>>>> [ipSrc, ipDst, pointlessExtraStuff, message, ip_src_addr, ip_dest_addr]
>>>>> 
>>>>> unless I go with:-
>>>>> 
>>>>> {
>>>>> "fieldTransformations": [
>>>>> {
>>>>> "transformation": "STELLAR",
>>>>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
>>>>> "config": {
>>>>> "ip_src_addr": "ipSrc",
>>>>> "ip_dest_addr": "ipDst",
>>>>> "pointlessExtraStuff": null,
>>>>> "ipSrc": null,
>>>>> "ipDst": null
>>>>> }
>>>>> }
>>>>> ]
>>>>> }
>>>>> 
>>>>> which seems a little over verbose.
>>>>> 
>>>>> Do you think it would be valuable to add a switch of some sort on the
>> transformation to make it “complete”, i.e. to only preserve fields which
>> are explicitly set.
>>>>> 
>>>>> To my mind, this breaks a principal of mutability, but gives us much
>> much cleaner mapping of data.
>>>>> 
>>>>> I would propose something like:
>>>>> 
>>>>> {
>>>>> "fieldTransformations": [
>>>>> {
>>>>> "transformation": "STELLAR",
>>>>> "complete": true,
>>>>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
>>>>> "config": {
>>>>> "ip_src_addr": "ipSrc",
>>>>> "ip_dest_addr": "ipDst"
>>>>> }
>>>>> }
>>>>> ]
>>>>> }
>>>>> 
>>>>> which would give me the set ["ip_src_addr", "ip_dst_addr", "message”]
>> effectively making the nulling in my previous example implicit.
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> Also, in the second scenario, if ‘output' were to be empty would we
>> assume that the output field set should be ["ip_src_addr", “ip_dst_addr”]?
>>>>> 
>>>>> Simon

Reply via email to