Hi all,
Is there any best practice in generating document ID in ElasticSearch?
Let's say we want to evenly distribute the data in the cluster and be able
to update the document fast.
Let's say my document is a user information with this JSON format, and I
index all the fields.
{"user_id":someLongValue, "name":someStringValue}, such as
: {"user_id":123, "name":"arinto"}
Based on simple requirements above, so far I've found 2 possible approaches:
1. This article (
http://exploringelasticsearch.com/book/advanced-techniques/routing.html)
that mentions that document id should be either UUID or monotonically
increasing to evenly distribute the data in the cluster's shards. That
means I need generate a UUID when indexing new data. But let's say I want
to retrieve the document and update the document with new field or new
data, I could not use 'get' API because the UUID is generated independent
of any document field. Hence I need to use 'search' API, which *I assume*
perform not as good as 'get' API. (Please correct me if I'm wrong). If all
the fields are indexed, can I improve 'search' API performance to be close
to 'get' API performance?
2. If let's say I use the "user_id" as the document id, I can easily use
'get' API to retrieve the document, but I'm afraid the document
distribution will not even because the "user_id" is not UUID and not
"monotonically increasing", i.e. sparse values.
Thank you and best regards,
Arinto
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/14eaa93e-0690-47e0-af9c-d8d84bdb59fd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.