> On 22 Nov 2017, at 18:39, Geoffrey Cox <redge...@gmail.com> wrote: > > Hi Mike, this sounds like a pretty cool enhancement. Just to clarify, > you're also proposing modifying the PUT/POST doc, etc... so that you can > specify a shard key per doc so that the doc can be stored on a specific > shard?
Yes, sort of. A document create request specifies a shard key as part of the document ID. The guarantee with respect to document placement then is: "All documents with the same shard key are stored in the same shard". By means of contrast, this *isn't* a way of saying "Put document on specific shard X". I don't find that ability very compelling for a user (why would they care that their doc was in range 000000000-abababab or whatever?), but introducing this grouping mechanism as a higher level abstraction on things meaningful within a data model I think does offer substantial benefit. To elaborate on why this is useful a couple use-cases might help. The first example is along the lines of using a user ID as a shard key. All documents for that user then end up on the same shard. A query can then be scoped by user ID (as its the shard key), which means that queries for a single user's data can be efficiently served from a single shard rather than asking all shards. This would significantly improve performance of an application from the point of view of that user. Or, in an IoT use case, you might use the device ID as the shard key enabling fast retrieval of measurements from a single device. It's important to note too that a shard may store documents from many different shard keys, so long as the above guarantee holds. In addition, the shard key needs to have high cardinality and to effectively spread requests over the shards. An example that doesn't work is using the date as the shard key for the IoT case: while this has a high cardinality, at any given time, only a single shard will be in the write path. Mike.