[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636167#comment-16636167
 ] 

ASF GitHub Bot commented on METRON-1801:
----------------------------------------

GitHub user nickwallen opened a pull request:

    https://github.com/apache/metron/pull/1218

    METRON-1801 Allow Customization of Elasticsearch Document ID

    Currently, the Metron GUID is always used as the Elasticsearch document ID. 
As documented in 
[METRON-1677](https://issues.apache.org/jira/browse/METRON-1677), using a 
randomized UUID like Java's `UUID.randomUUID()` can negatively impact 
Elasticsearch performance.  This change allows a user to customize the 
identifier that is used by Elasticsearch when indexing documents.
    
    We do this by allowing a user to specify the name of the message field 
whose value is set as the document ID.  The user can customize this by defining 
a global variable called `es.document.id`.  There are three usage scenarios 
that I see.
    
      * By default, Metron's GUID field will be used as the source of the 
document ID.  This ensures backwards compatible behavior. This is the behavior 
should the value be set as below or should the global variable not be set.
        ```
        es.document.id = guid
        ```
    
      * If a user wants Elasticsearch to define its own document id, then 
`es.document.id` should be set to a blank value or empty string.  In this case, 
the document ID will not be set by the client and Elasticsearch will define its 
own.
        ```
        es.document.id = 
        ```
    
      * If a user wants to set their own custom document ID, they should create 
an enrichment that defines a new message field like `my_document_id`.  They 
should then use this new field to set the Elasticsearch document ID.
        ```
        es.document.id = my_document_id
        ```
    
    ## TODO
    
    I have a few more loose ends to tie-up, but wanted to get a start on the 
test plan and description in case the community has early feedback to offer.
    
    - [ ] Allow user to set the `es.document.id` value in the Mpack.
    - [ ] Document this global settings and usage scenario in a README.
    - [ ] More unit/integration tests might be needed.  Trying to determine 
where those need to go.
    - [ ] Run the UI e2e tests to ensure they remain happy.
    - [ ] Fix issue with Solr integration tests.
    
    ## Changes
    
      * The `ElasticsearchWriter` was updated to allow the document ID to be 
configurable.
    
      * A 'search by GUID' in the REST layer was implicitly using the document 
ID, whereas it should be using the Metron GUID.
    
    * Search results should use the Metron GUID as the ID returned to the UI.  
All IDs visible to the user should always be the Metron GUID, not the document 
ID.
    
    ## Testing
    
    1. Spin-up a development environment.  You may need to stop the PCAP and/or 
Profiler topology to free-up slots to allow indexing to occur.
    
        ```
        cd metron-deployment/development/centos6
        vagrant up
        ```
    
    1. Ensure that alerts are visible in the Alerts UI.
    
    1. Stop the indexing topologies using Ambari.
    
    1. Login to the VM.
    
        ```
        vagrant ssh
        sudo su -
        ```
    
    1. Delete the existing indices in Elasticsearch.
    
        ```
        curl -XDELETE http://node1:9200/bro*
        curl -XDELETE http://node1:9200/snort*
        ```
    
    1. Launch the REPL.
    
        ```
        source /etc/default/metron
        cd $METRON_HOME
        bin/stellar -z $ZOOKEEPER
        ```
    
    1. Change the configuration so that Elasticsearch generates its own unique 
document ID. Define `es.doc.id.source.field` to be an empty or blank in the 
global settings.
    
        ```
        [Stellar]>>> g := CONFIG_GET("GLOBAL")
        ...
        [Stellar]>>> g := SHELL_EDIT(g)
        {
          "es.clustername" : "metron",
          "es.ip" : "node1:9300",
          "es.date.format" : "yyyy.MM.dd.HH",
          "es.document.id": " ",
          "parser.error.topic" : "indexing",
          "update.hbase.table" : "metron_update",
          "update.hbase.cf" : "t",
          "es.client.settings" : {
            "client.transport.ping_timeout" : "500s"
          },
          "profiler.client.period.duration" : "15",
          "profiler.client.period.duration.units" : "MINUTES",
          "user.settings.hbase.table" : "user_settings",
          "user.settings.hbase.cf" : "cf",
          "bootstrap.servers" : "node1:6667",
          "source.type.field" : "source:type",
          "threat.triage.score.field" : "threat:triage:score",
          "enrichment.writer.batchSize" : "15",
          "enrichment.writer.batchTimeout" : "0",
          "profiler.writer.batchSize" : "15",
          "profiler.writer.batchTimeout" : "0",
          "geo.hdfs.file" : "/apps/metron/geo/default/GeoLite2-City.mmdb.gz"
        }
        ...
        [Stellar]>>> CONFIG_PUT("GLOBAL", g)
        ```
    
    1. Restart the indexing topology.
    
        ```
        bin/start_elasticsearch_topology.sh
        ```
    
    1. Open the Alerts UI and ensure that alerts are visible.  Notice that the 
ID listed in the table has not changed.  This will always display the Metron 
GUID, no matter what ID used for the document.
    
        ![screen shot 2018-10-02 at 5 43 54 
pm](https://user-images.githubusercontent.com/2475409/46378802-c5abbe00-c66a-11e8-9a65-5ce2ad84dd59.png)
    
    1. Click on a GUID in the table to search for a single alert.
    
        ![screen shot 2018-10-02 at 12 42 54 
pm](https://user-images.githubusercontent.com/2475409/46378629-3b635a00-c66a-11e8-9b13-6c0f54e48dfa.png)
    
    1. Create a meta-alert and ensure that alerts tied to the meta-alert are 
still discoverable by GUID.
    
        ![screen shot 2018-10-02 at 5 45 18 
pm](https://user-images.githubusercontent.com/2475409/46378860-f3910280-c66a-11e8-8612-e18ab2e4ae03.png)
    
    1. Open Kibana and verify that indeed Elasticsearch is generated its own 
document IDs.  You will notice an `_id` field which has been generated by 
Elasticsearch.  This will be different than the UUID generated by Metron and 
stored as part of the document as `guid`.
    
        ![screen shot 2018-10-02 at 4 47 52 
pm](https://user-images.githubusercontent.com/2475409/46378868-fc81d400-c66a-11e8-9981-3e62033e44ec.png)
    
    ## Pull Request Checklist
    
    - [ ] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
    - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
    - [ ] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified 
and tested manually?
    - [ ] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
    - [ ] Have you written or updated unit tests and or integration tests to 
verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [ ] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nickwallen/metron METRON-1801

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/1218.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1218
    
----
commit 23a96ccc0786d31dfe2190fe470b0afe2b52c936
Author: Nick Allen <nick@...>
Date:   2018-10-01T22:39:03Z

    Can change source field used for document ID. Unable to /findOne in Alerts 
UI

commit 2d0f478327981403eb2463b7efefe3d871441db0
Author: Nick Allen <nick@...>
Date:   2018-10-02T15:09:02Z

    Cannot assume that ES doc ID == Metron GUID

commit d66c839eda55d5efd22f56f8f5920937e2703d14
Author: Nick Allen <nick@...>
Date:   2018-10-02T15:10:13Z

    Removed unnecessary dependencies

commit 36c59211181480db571619b811456105b80d7b0d
Author: Nick Allen <nick@...>
Date:   2018-10-02T19:33:04Z

    Search results need to use Metron GUID as ID, not the doc ID

commit 4ebb8000522540b6ff2a30e2d5ffeffcf6016bb4
Author: Nick Allen <nick@...>
Date:   2018-10-02T21:31:58Z

    Small rename

commit 13d698df8cb46a3cf8ef747f7f62acc298ef8e4b
Author: Nick Allen <nick@...>
Date:   2018-10-02T21:38:00Z

    Removed unncessary part of error msg

----


> Allow Customization of Elasticsearch Document ID
> ------------------------------------------------
>
>                 Key: METRON-1801
>                 URL: https://issues.apache.org/jira/browse/METRON-1801
>             Project: Metron
>          Issue Type: Sub-task
>            Reporter: Nick Allen
>            Assignee: Nick Allen
>            Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to