[ 
https://issues.apache.org/jira/browse/METRON-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449783#comment-16449783
 ] 

Simon Elliston Ball commented on METRON-1538:
---------------------------------------------

This is not necessarily a good idea. We use guids for message amendments, and 
meta alerting. The guids therefore need to be generated in the Metron pipeline, 
otherwise guids will not match in ES and HDFS indices, leading to total data 
mis-match. 

What we could do is to have a useless id created in ES in addition to the guid. 
This would add storage overhead, and hurt performance on meta alerts and search 
lookup, but may not be as significant in the ingest impact of 
non-auto-generated keys. Arguably this scenario risks data corruption in ES 
unless we perform the same uniqueness checks anyway, but that may be something 
that can be resolved, or accepted as a small front-end event duplication risk 
in short term indices.

> Don't use GUIDS for Elastic document id, but autogenerated ID's for 
> performance
> -------------------------------------------------------------------------------
>
>                 Key: METRON-1538
>                 URL: https://issues.apache.org/jira/browse/METRON-1538
>             Project: Metron
>          Issue Type: Improvement
>    Affects Versions: 0.4.3
>            Reporter: Ward Bekker
>            Priority: Major
>              Labels: performance
>
> Metron currently uses GUIDS for ES document Ids, this goes against the best 
> practice:
> "When indexing a document that has an explicit id, Elasticsearch needs to 
> check whether a document with the same id already exists within the same 
> shard, which is a costly operation and gets even more costly as the index 
> grows. By using auto-generated ids, Elasticsearch can skip this check, which 
> makes indexing faster."
> [https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-]speed.html#_use_auto_generated_ids



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to