[ 
https://issues.apache.org/jira/browse/METRON-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593092#comment-16593092
 ] 

Ali Nazemian commented on METRON-1677:
--------------------------------------

[~simonellistonball] What if we still keep GUID as an extra field in ES/Solr, 
but don't pass it as a document ID to ES/Solr and let them decide what to use. 
However, it is still required to provide an ability for a Metron user that 
wants to enable deduplication by overwriting ID at the index time. No matter 
how Lucene friendly the document ID is, it is always slower for indexing to 
provide document ID at the indexing client side because it enables the 
deduplication pipeline and index becomes an upsert. 

 

 

> UUIDv4 GUID is not Lucene friendly
> ----------------------------------
>
>                 Key: METRON-1677
>                 URL: https://issues.apache.org/jira/browse/METRON-1677
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Ali Nazemian
>            Priority: Major
>
> Using UUIDv4 by UUID.randomUUID() in Java is not Lucene friendly and impacts 
> Elasticsearch and Solr indexing/search performance and makes it unpredictable 
> sometimes.
> http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html
> Moreover, specifying doc id at the client side will impact indexing 
> throughput due to enabling Elasticsearch deduplication policy and changing 
> insert to upsert. Hence, indexing throughput can be increased by providing an 
> ability to disable ID generation at the client side. Currently, the way ID is 
> generated can be overwritten at the config level by replacing Metron default 
> guid via Stellar, but it is not possible to disable it completely to let 
> Elasticsearch decide what ID can be used for the corresponding document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to