Hi all, 

Is there any best practice in generating document ID in ElasticSearch? 
Let's say we want to evenly distribute the data in the cluster and be able 
to update the document fast. 
 
Let's say my document is a user information with this JSON format, and I 
index all the fields.
{"user_id":someLongValue, "name":someStringValue}, such as 
: {"user_id":123, "name":"arinto"}

Based on simple requirements above, so far I've found 2 possible approaches:

   1. This article (
   http://exploringelasticsearch.com/book/advanced-techniques/routing.html) 
   that mentions that document id should be either UUID or monotonically 
   increasing to evenly distribute the data in the cluster's shards. That 
   means I need generate a UUID when indexing new data. But let's say I want 
   to retrieve the document and update the document with new field or new 
   data, I could not use 'get' API because the UUID is generated independent 
   of any document field. Hence I need to use 'search' API, which *I assume* 
   perform not as good as 'get' API. (Please correct me if I'm wrong). If all 
   the fields are indexed, can I improve 'search' API performance to be close 
   to 'get' API performance? 
   2. If let's say I use the "user_id" as the document id, I can easily use 
   'get' API to retrieve the document, but I'm afraid the document 
   distribution will not even because the "user_id" is not UUID and not 
   "monotonically increasing", i.e. sparse values. 

Thank you and best regards, 

Arinto

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/14eaa93e-0690-47e0-af9c-d8d84bdb59fd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to