Would https://github.com/apache/metron/pull/687 play some role in this?
Or could it be made to?


On December 4, 2017 at 12:21:40, Casey Stella (ceste...@gmail.com) wrote:

So, just chiming in here.  It seems to me that we have a problem with
extraneous fields in a couple of different ways:

* Temporary Variables

I think that the problem of temporary variables is one beyond just the
parser.  What I'd like to see is the Stellar field transformations operate
similar to the enrichment field transformations in that they are no longer
a map (this is useful beyond this case for having multiple assignments for
a variable) and having a special assignment indicator which would indicate
a temporary variable (e.g. ^= instead of :=).  This would clean up some of
the usecases in enrichments as well.  Combine this with the assumption that
all non-temporary fields are included in output for the field
transformation if it is not specified and I think we have something that is
sensible and somewhat backwards compatible.  To wit:
{
  "fieldTransformations": [
    {
      "transformation": "STELLAR",
      "config": [
        "ipSrc ^= TRIM(raw_ip_src)"
        "ip_src_addr := ipSrc"
      ]
    }
  ]
}

* Extraneous Fields from the Parser

For these, we do currently have a REMOVE field transformation, but I'd be
ok with a PROJECT or COMPLETE field transformation to provide a whitelist.
That might look like:
{
  "fieldTransformations": [
    {
      "transformation": "STELLAR",
      "config": [
        "ipSrc ^= TRIM(raw_ip_src)"
        "ip_src_addr := ipSrc"
      ]
    },
     {
      "transformation": "COMPLETE",
      "output" : [ "ip_src_addr", "ip_dst_addr", "message"]
    }
  ]
}

I think having these two treated separately makes sense because sometimes
you will want COMPLETE and sometimes not.  Also, this fits within the core
abstraction that we already have.

On Thu, Nov 30, 2017 at 8:21 PM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> Hmmm… Actually, I kinda like that.
>
> May want a little refactoring in the back for clarity.
>
> My question about whether we could ever imagine this ‘cleanup policy’
> applying to other transforms would sway me to the field rather than
> transformation name approach though.
>
> Simon
>
> > On 1 Dec 2017, at 01:17, Otto Fowler <ottobackwa...@gmail.com> wrote:
> >
> > Or, we can create new transformation types
> > STELLAR_COMPLETE, which may be more in line with the original design.
> >
> >
> >
> > On November 30, 2017 at 20:14:46, Otto Fowler (ottobackwa...@gmail.com
> <mailto:ottobackwa...@gmail.com>) wrote:
> >
> >> I would suggest that instead of explicitly having “complete”, we have
> “operation”:”complete”
> >>
> >> Such that we can have multiple transformations, each with a different
> “operation”.
> >> No operation would be the status quo ante, if we can do it so that we
> don’t get errors with old configs and the keep same behavior.
> >>
> >> {
> >> "fieldTransformations": [
> >> {
> >> "transformation": "STELLAR",
> >> “operation": “complete",
> >> "output": ["ip_src_addr", "ip_dst_addr"],
> >> "config": {
> >> "ip_src_addr": "ipSrc",
> >> "ip_dest_addr": "ipDst"
> >> } ,
> >> {
> >> "transformation": "STELLAR",
> >> “operation": “SomeOtherThing",
> >> "output": [“foo", “bar"],
> >> "config": {
> >> “foo": “TO_UPPER(foo)",
> >> “bar": “TO_LOWER(bar)"
> >> }
> >> }
> >> ]
> >> }
> >>
> >>
> >> Sorry for the junk examples, but hopefully it makes sense.
> >>
> >>
> >>
> >>
> >>
> >> On November 30, 2017 at 20:00:06, Simon Elliston Ball (
> si...@simonellistonball.com <mailto:si...@simonellistonball.com>) wrote:
> >>
> >>> I’m looking at the way parser config works, and transformation of
> field from their native names in, for example the ASA or CEF parsers, into
> a standard data model.
> >>>
> >>> At the moment I would do something like this:
> >>>
> >>> assuming I have fields [ipSrc, ipDst, pointlessExtraStuff, message] I
> might have:
> >>>
> >>> {
> >>> "fieldTransformations": [
> >>> {
> >>> "transformation": "STELLAR",
> >>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
> >>> "config": {
> >>> "ip_src_addr": "ipSrc",
> >>> "ip_dest_addr": "ipDst"
> >>> }
> >>> }
> >>> ]
> >>> }
> >>>
> >>> which leave me with the field set:
> >>> [ipSrc, ipDst, pointlessExtraStuff, message, ip_src_addr, ip_dest_addr]
> >>>
> >>> unless I go with:-
> >>>
> >>> {
> >>> "fieldTransformations": [
> >>> {
> >>> "transformation": "STELLAR",
> >>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
> >>> "config": {
> >>> "ip_src_addr": "ipSrc",
> >>> "ip_dest_addr": "ipDst",
> >>> "pointlessExtraStuff": null,
> >>> "ipSrc": null,
> >>> "ipDst": null
> >>> }
> >>> }
> >>> ]
> >>> }
> >>>
> >>> which seems a little over verbose.
> >>>
> >>> Do you think it would be valuable to add a switch of some sort on the
> transformation to make it “complete”, i.e. to only preserve fields which
> are explicitly set.
> >>>
> >>> To my mind, this breaks a principal of mutability, but gives us much
> much cleaner mapping of data.
> >>>
> >>> I would propose something like:
> >>>
> >>> {
> >>> "fieldTransformations": [
> >>> {
> >>> "transformation": "STELLAR",
> >>> "complete": true,
> >>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
> >>> "config": {
> >>> "ip_src_addr": "ipSrc",
> >>> "ip_dest_addr": "ipDst"
> >>> }
> >>> }
> >>> ]
> >>> }
> >>>
> >>> which would give me the set ["ip_src_addr", "ip_dst_addr", "message”]
> effectively making the nulling in my previous example implicit.
> >>>
> >>> Thoughts?
> >>>
> >>> Also, in the second scenario, if ‘output' were to be empty would we
> assume that the output field set should be ["ip_src_addr", “ip_dst_addr”]?
> >>>
> >>> Simon
>
>

Reply via email to