[ 
https://issues.apache.org/jira/browse/METRON-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042941#comment-16042941
 ] 

ASF GitHub Bot commented on METRON-987:
---------------------------------------

Github user cestella commented on a diff in the pull request:

    https://github.com/apache/metron/pull/609#discussion_r120932266
  
    --- Diff: metron-platform/metron-enrichment/README.md ---
    @@ -71,40 +73,94 @@ The `fieldMap`contents are of interest because they 
contain the routing and conf
           ]
           }
     ```
    -Based on this sample config, both ip_src_addr and ip_dst_addr will go to 
the `geo`, `host`, and `hbaseEnrichment` adapter bolts. For the `geo`, `host` 
and `hbaseEnrichment`, this is sufficient.  However, more complex enrichments 
may contain their own configuration.  Currently, the `stellar` enrichment 
requires a more complex configuration, such as:
    +Based on this sample config, both `ip_src_addr` and `ip_dst_addr` will go 
to the `geo`, `host`, and 
    +`hbaseEnrichment` adapter bolts. 
    + 
    +#### Stellar Enrichment Configuration
    +For the `geo`, `host` and `hbaseEnrichment`, this is sufficient. However, 
more complex enrichments 
    +may contain their own configuration.  Currently, the `stellar` enrichment 
is more adaptable and thus
    +requires a more nuanced configuration.
    +
    +At its most basic, we want to take a message and apply a couple of 
enrichments, such as converting the
    +`hostname` field to lowercase. We do this by specifying the transformation 
inside of the 
    +`config` for the `stellar` fieldMap.  There are two syntaxes that are 
supported, specifying the transformations
    +as a map with the key as the field and the value the stellar expression:
     ```
         "fieldMap": {
            ...
           "stellar" : {
             "config" : {
    -          "numeric" : {
    -                      "foo": "1 + 1"
    -                      }
    -          ,"ALL_CAPS" : "TO_UPPER(source.type)"
    +          "hostname" : "TO_LOWER(hostname)"
             }
           }
         }
     ```
     
    -Whereas the simpler enrichments just need a set of fields explicitly 
stated so they can be separated from the message and sent to the enrichment 
adapter bolt for enrichment and ultimately joined back in the join bolt, the 
stellar enrichment has its set of required fields implicitly stated through 
usage.  For instance, if your stellar statement references a field, it should 
be included and if not, then it should not be included.  We did not want to 
require users to make explicit the implicit.
    +Another approach is to make the transformations as a list with the same 
`var := expr` syntax as is used
    +in the Stellar REPL, such as:
    +```
    +    "fieldMap": {
    +       ...
    +      "stellar" : {
    +        "config" : [
    +          "hostname := TO_LOWER(hostname)"
    +        ]
    +      }
    +    }
    +```
    +
    +Sometimes arbitrary stellar enrichments may take enough time that you 
would prefer to split some of them
    +into groups and execute the groups of stellar enrichments in parallel.  
Take, for instance, if you wanted
    +to do an HBase enrichment and a profiler call which were independent of 
one another.  This usecase is 
    +supported by splitting the enrichments up as groups.
     
    -The other way in which the stellar enrichment is somewhat more complex is 
in how the statements are executed.  In the general purpose case for a list of 
fields, those fields are used to create a message to send to the enrichment 
adapter bolt and that bolt's worker will handle the fields one by one in serial 
for a given message.  For stellar enrichment, we wanted to have a more complex 
design so that users could specify the groups of stellar statements sent to the 
same worker in the same message (and thus executed sequentially).  Consider the 
following configuration:
    +Consider the following example:
     ```
         "fieldMap": {
    +       ...
           "stellar" : {
             "config" : {
    -          "numeric" : {
    -                      "foo": "1 + 1"
    -                      "bar" : TO_LOWER(source.type)"
    -                      }
    -         ,"text" : {
    -                   "ALL_CAPS" : "TO_UPPER(source.type)"
    -                   }
    +          "malicious_domain_enrichment" : {
    +            "is_bad_domain" : "ENRICHMENT_EXISTS('malicious_domains', 
ip_dst_addr, 'enrichments', 'cf')"
    +          },
    +          "login_profile" : [
    +            "profile_window := PROFILE_WINDOW('from 6 months ago')", 
    +            "global_login_profile := 
PROFILE_GET('distinct_login_attempts', 'global', profile_window)",
    +            "stats := STATS_MERGE(global_login_profile)",
    +            "auth_attempts_median := STATS_PERCENTILE(stats, 0.5)", 
    +            "auth_attempts_sd := STATS_SD(stats)",
    +            "profile_window := null", 
    +            "global_login_profile := null", 
    +            "stats := null"
    +          ]
             }
           }
         }
     ```
    -We have a group called `numeric` whose stellar statements will be executed 
sequentially.  In parallel to that, we have the group of stellar statements 
under the group `text` executing.  The intent here is to allow you to not force 
higher latency operations to be done sequentially. You can use any name for 
your groupings you like. Be aware that the configuration is a map and duplicate 
configuration keys' values are not combined, so the duplicate configuration 
value will be overwritten.
    +
    +Here we want to perform two enrichments that hit HBase and we would rather 
not run in sequence.  These
    +enrichments are entirely independent of one another (i.e. neither relies 
on the output of the other).  In
    +this case, we've created a group called `malicious_domain_enrichment` to 
inquire about whether the destination
    +address exists in the HBase enrichment table in the `malicious_domains` 
enrichment type.  This is a simple
    +enrichment, so we can express the enrichment group as a map with the new 
field `is_bad_domain` being a key
    +and the stellar expression associated with that operation being the 
associated value.
    +
    +In contrast, the stellar enrichment group `login_profile` is interacting 
with the profiler, has multiple temporary
    +expressions (i.e. `profile_window`, `global_login_profile`, and `stats`) 
that are useful only within the context
    +of this group of stellar expressions.  In this case, we would need to 
ensure that we use the list construct
    +when specifying the group and remember to set the temporary variables to 
`null` so they are not passed along.
    +
    +In general, things to note from this section are as follows:
    +* The stellar enrichments for the `stellar` enrichment adapter are 
specified in the `config` for the `stellar` enrichment
    +adapter in the `fieldMap`
    +* Groups of independent (i.e. no expression in any group depend on the 
output of an expression from an other group) may be executed in parallel
    +* If you have the need to use temporary variables, you may use the list 
construct.  Ensure that you assign the variables to `null` before the end of 
the group.
    +* **Ensure that you do not assign a field to a stellar expression which 
returns an object which JSON cannot represent.**
    --- End diff --
    
    I agree, this was just filling in documentation that didn't exist.  There 
should be a follow-on ticket here for a solution.


> Allow stellar enrichments to be specified by a list as well as a map
> --------------------------------------------------------------------
>
>                 Key: METRON-987
>                 URL: https://issues.apache.org/jira/browse/METRON-987
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>            Assignee: Casey Stella
>
> Currently, stellar enrichments are specified by a map associating fields with 
> the stellar expressions associated with the fields.  There is a significant 
> downside here in that you cannot update previously assigned fields.  For 
> instance, the following cannot be represented currently:
> {code}
>  "fieldMap": {
>        ...
>       "stellar" : {
>         "config" : {
>           "hostname" : "if ENDS_WITH(hostname, '.') then CHOP(hostname) else 
> hostname",
>           "hostname" : "TO_LOWER(hostname)"
>         }
>       }
>     }
> {code}
> A consequent of this deficiency is that we also cannot use temporary 
> variables and unset them after their use inside an enrichment group.
> The proposed change is to allow users to use lists of strings representing 
> stellar expression assignments with the same syntax as the Stellar REPL.  
> This would be as an alternative to maps, but the map syntax would also be 
> supported for legacy.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to