[ 
https://issues.apache.org/jira/browse/METRON-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547609#comment-16547609
 ] 

Simon Elliston Ball commented on METRON-1677:
---------------------------------------------

This is an excellent point on the performance side. Guids still need to be 
generated by metron topologies to ensure consistency with hdfs based stored (eg 
reindexing scenarios and consistency of the hbase based update log for 
mutation). However uuid1 may make more sense. The interesting element will be 
how we encode that. Binary encoding would be optimal, but we will need to 
consider the implications for Json friendly encoding of the binary uuid1 and 
other touchpoints for uuid use. This could be a pretty broad PR touching the 
DAO, REST and UI layers as well as the ingest pipeline.

> UUIDv4 GUID is not Lucene friendly
> ----------------------------------
>
>                 Key: METRON-1677
>                 URL: https://issues.apache.org/jira/browse/METRON-1677
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Ali Nazemian
>            Priority: Major
>
> Using UUIDv4 by UUID.randomUUID() in Java is not Lucene friendly and impacts 
> Elasticsearch and Solr indexing/search performance and makes it unpredictable 
> sometimes.
> http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html
> Moreover, specifying doc id at the client side will impact indexing 
> throughput due to enabling Elasticsearch deduplication policy and changing 
> insert to upsert. Hence, indexing throughput can be increased by providing an 
> ability to disable ID generation at the client side. Currently, the way ID is 
> generated can be overwritten at the config level by replacing Metron default 
> guid via Stellar, but it is not possible to disable it completely to let 
> Elasticsearch decide what ID can be used for the corresponding document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to