[ 
https://issues.apache.org/jira/browse/METRON-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877300#comment-15877300
 ] 

ASF GitHub Bot commented on METRON-733:
---------------------------------------

GitHub user justinleet opened a pull request:

    https://github.com/apache/incubator-metron/pull/461

    METRON-733: Remove Geo database from ParserBolt

    To create the original problem, just run up a parser and make sure there's 
no geo data on hdfs (by default in /apps/metron/geo).
    
    This PR removes geo from metron-parsers entirely (since it shouldn't be 
necessary at all, only pulled in when Stellar cares about it).
    
    ### Testing
    To validate this, I basically ran through the squid demo with the geo file 
missing, which works as expected (no exception thrown).  In addition, the 
ParserBoltTest is updated to not have a reference to the test Geo DB data (and 
runs fine without it).
    
    To ensure that Stellar GEO_GET works as expected in a parser, quick-dev was 
spun up.  The steps for squid were followed, but with a custom parser config
    ```
    {
      "parserClassName": "org.apache.metron.parsers.GrokParser",
      "sensorTopic": "squid",
      "parserConfig": {
        "grokPath": "/patterns/squid",
        "patternLabel": "SQUID_DELIMITED",
        "timestampField": "timestamp"
      },
      "fieldTransformations" : [
        {
          "transformation" : "STELLAR"
        ,"output" : [ "geo_test" ]
        ,"config" : {
          "geo_test" : "GEO_GET(ip_dst_addr)"
                    }
        }
                               ]
    }
    ```
    
    Either update global.json with a valid geo.hdfs.file or run 
`/usr/metron/0.3.1/bin/geo_enrichment_load.sh -z node1:2181 -r 
/apps/metron/geo/default/` to place the file in the default spot (instead of a 
timestamped stop). This is necessary to ensure that the push doesn't clobber 
geo configs.
    
    The resulting data in the index includes
    ```
    {
    ...
                   "geo_test": {
                      "country": "US",
                      "dmaCode": "807",
                      "city": "San Francisco",
                      "postalCode": "94107",
                      "latitude": "37.7697",
                      "location_point": "37.7697,-122.3933",
                      "locID": "5391959",
                      "longitude": "-122.3933"
                   },
    ...
                   "ip_dst_addr": "151.101.192.73",
    ...
    }
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/justinleet/incubator-metron geo_profiler_fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/461.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #461
    
----

----


> Remove Geo database from ParserBolt
> -----------------------------------
>
>                 Key: METRON-733
>                 URL: https://issues.apache.org/jira/browse/METRON-733
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Justin Leet
>            Assignee: Justin Leet
>
> The ParserBolt inits the Geo DB in its prepare() method.  Parsers, unlike 
> enrichments, do not use geo as a base capability.  They should be able to run 
> without the DB file existing in HDFS.  This init causes issues if the file is 
> missing, even if GEO_GET is unused in the parser definition.
> This change should preserve the ability of Parsers to employ Stellar's 
> GEO_GET.  The Stellar function already handles that init, so it shouldn't be 
> an issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to