[ 
https://issues.apache.org/jira/browse/METRON-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449843#comment-16449843
 ] 

Nick Allen edited comment on METRON-1538 at 4/24/18 1:34 PM:
-------------------------------------------------------------

The idea of a uniqueness check for a GUID/UUID is foreign to me.  Elasticsearch 
likely performs this check because in many cases a user provided ID will not be 
a GUID.  A user might choose another domain specific value where the likeliness 
of collision is much higher.  In our specific scenario, we are using a 
GUID/UUID.  There is no need to check for uniqueness.

We could fairly easily make this configurable so that the end user can make the 
right decision for their environment.  The ElasticsearchWriter would accept a 
parameter which defines the name of the field to use as the document ID.  If 
the field is defined, the writer extracts that value from the message and uses 
that as the document ID.  If the field is left undefined, empty or null, no 
document ID is defined by the ElasticsearchWriter which would allow ES to 
auto-generate the ID.  The indexed document would still contain Metron's GUID 
for cross-correlation.

[Currently, the Metron GUID is always used as the document ID, if it 
exists.|https://github.com/apache/metron/blob/a8b555dcc9f548d7b91789a46d9435b4d8b17581/metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/writer/ElasticsearchWriter.java#L73-L76]

 


was (Author: nickwallen):
The idea of a uniqueness check for a GUID/UUID is foreign to me.  Elasticsearch 
likely performs this check because in many cases a user provided ID will not be 
a GUID.  A user might choose another domain specific value where the likeliness 
of collision is much higher.  In our specific scenario, we are using a 
GUID/UUID.  There is no need to check for uniqueness.

We could fairly easily make this configurable so that the end user can make the 
right decision for their environment.  The ElasticsearchWriter would accept a 
parameter which defines the name of the field to use as the document ID.  If 
the field is defined, the writer extracts that value from the message and uses 
that as the document ID.  If the field is left undefined, empty or null, no 
document ID is defined by the ElasticsearchWriter which would allow ES to 
auto-generate the ID.  The indexed document would still contain Metron's GUID 
for cross-correlation.

 

> Don't use GUIDS for Elastic document id, but autogenerated ID's for 
> performance
> -------------------------------------------------------------------------------
>
>                 Key: METRON-1538
>                 URL: https://issues.apache.org/jira/browse/METRON-1538
>             Project: Metron
>          Issue Type: Improvement
>    Affects Versions: 0.4.3
>            Reporter: Ward Bekker
>            Priority: Major
>              Labels: performance
>
> Metron currently uses GUIDS for ES document Ids, this goes against the best 
> practice:
> "When indexing a document that has an explicit id, Elasticsearch needs to 
> check whether a document with the same id already exists within the same 
> shard, which is a costly operation and gets even more costly as the index 
> grows. By using auto-generated ids, Elasticsearch can skip this check, which 
> makes indexing faster."
> [https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-]speed.html#_use_auto_generated_ids



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to