Hi Mike and Geoff, forgive me if I am asking a really stupid question, but wouldn't restricting certain data to specific shards defy the very concept and core benefits of a clustered database? br Johs
> On 23 Nov 2017, at 22:49, Geoffrey Cox <redge...@gmail.com> wrote: > > Ah, yeah, this makes sense to me. I think this has great potential! > > On Thu, Nov 23, 2017 at 4:56 AM Mike Rhodes <mrho...@linux.vnet.ibm.com> > wrote: > >> >> >>> On 22 Nov 2017, at 18:39, Geoffrey Cox <redge...@gmail.com> wrote: >>> >>> Hi Mike, this sounds like a pretty cool enhancement. Just to clarify, >>> you're also proposing modifying the PUT/POST doc, etc... so that you can >>> specify a shard key per doc so that the doc can be stored on a specific >>> shard? >> >> Yes, sort of. A document create request specifies a shard key as part of >> the document ID. The guarantee with respect to document placement then is: >> >> "All documents with the same shard key are stored in the same shard". >> >> By means of contrast, this *isn't* a way of saying "Put document on >> specific shard X". I don't find that ability very compelling for a user >> (why would they care that their doc was in range 000000000-abababab or >> whatever?), but introducing this grouping mechanism as a higher level >> abstraction on things meaningful within a data model I think does offer >> substantial benefit. >> >> To elaborate on why this is useful a couple use-cases might help. >> >> The first example is along the lines of using a user ID as a shard key. >> All documents for that user then end up on the same shard. A query can then >> be scoped by user ID (as its the shard key), which means that queries for a >> single user's data can be efficiently served from a single shard rather >> than asking all shards. This would significantly improve performance of an >> application from the point of view of that user. >> >> Or, in an IoT use case, you might use the device ID as the shard key >> enabling fast retrieval of measurements from a single device. >> >> It's important to note too that a shard may store documents from many >> different shard keys, so long as the above guarantee holds. In addition, >> the shard key needs to have high cardinality and to effectively spread >> requests over the shards. >> >> An example that doesn't work is using the date as the shard key for the >> IoT case: while this has a high cardinality, at any given time, only a >> single shard will be in the write path. >> >> Mike. >> >> >>