> On 22 Nov 2017, at 18:39, Geoffrey Cox <redge...@gmail.com> wrote:
> 
> Hi Mike, this sounds like a pretty cool enhancement. Just to clarify,
> you're also proposing modifying the PUT/POST doc, etc... so that you can
> specify a shard key per doc so that the doc can be stored on a specific
> shard?

Yes, sort of. A document create request specifies a shard key as part of the 
document ID. The guarantee with respect to document placement then is:

"All documents with the same shard key are stored in the same shard".

By means of contrast, this *isn't* a way of saying "Put document on specific 
shard X". I don't find that ability very compelling for a user (why would they 
care that their doc was in range 000000000-abababab or whatever?), but 
introducing this grouping mechanism as a higher level abstraction on things 
meaningful within a data model I think does offer substantial benefit.

To elaborate on why this is useful a couple use-cases might help.

The first example is along the lines of using a user ID as a shard key. All 
documents for that user then end up on the same shard. A query can then be 
scoped by user ID (as its the shard key), which means that queries for a single 
user's data can be efficiently served from a single shard rather than asking 
all shards. This would significantly improve performance of an application from 
the point of view of that user.

Or, in an IoT use case, you might use the device ID as the shard key enabling 
fast retrieval of measurements from a single device.

It's important to note too that a shard may store documents from many different 
shard keys, so long as the above guarantee holds. In addition, the shard key 
needs to have high cardinality and to effectively spread requests over the 
shards.

An example that doesn't work is using the date as the shard key for the IoT 
case: while this has a high cardinality, at any given time, only a single shard 
will be in the write path.

Mike.


Reply via email to