The way I understand the proposal, you could satisfy at most one of those requests (probably the *username* one) with a local query. The other one would have to be a global query, but the proposal does allow for a mix of local and global queries against the same dataset.
Adam > On Jan 22, 2018, at 9:27 AM, Geoffrey Cox <redge...@gmail.com> wrote: > > Hey Mike, > > I've been thinking more about your proposal above and when it is combined > with the new access-per-db enhancement it should greatly reduce the need > for db-per-user. One thing that I'm left wondering though is whether there > is consideration for different shard keys per doc. From what I gather in > your notes above, each doc would only have a single shard key and I think > implementing this alone will take significant work. However, if there was a > way to have multiple shard keys per doc then you could avoid having > duplicated data. > > For example, assume a database of student work: > > 1. Each doc has a `*username`* that corresponds with the owner of the doc > 2. Each doc has a `*classId`* that corresponds with the class for which > the assignment was submitted > > Ideally, you'd be able to issue a query with a shard key specific to the ` > *username`* to get a student's work and yet another query with a shard key > specific to the `*classId` *to get the work from a teacher's > perspective. Would your proposal allow for something like this? > > If not, I think you'd have to do something like duplicate the data, e.g. > add another doc that has the username of the teacher so that you could > query from the teacher's perspective. This of course could get pretty messy > when you consider more complicated scenarios as you could easily end up > with a lot of duplicated data. > > Thanks! > > Geoff > > On Tue, Nov 28, 2017 at 5:35 AM Mike Rhodes <mrho...@linux.vnet.ibm.com> > wrote: > >> >>> On 25 Nov 2017, at 15:45, Adam Kocoloski <kocol...@apache.org> wrote: >>> >>> Yes indeed Jan :) Thanks Mike for writing this up! A couple of comments >> on the proposal: >>> >>>> • For databases where this is enabled, every document needs a >> shard key. >>> >>> What would happen if this constraint were relaxed, and documents without >> a “:” in their ID simply used the full ID as the shard key as is done now? >> >> I think that practically it's not that awful. Documents without shard keys >> end up spread reasonably, albeit uncontrollably, across shards. >> >> But I think from a usability perspective, forcing this to be all or >> nothing for a database makes sense. It makes sure that every document in >> the database behaves the same way rather than having a bunch of stuff that >> behaves one way and a bunch of stuff that behaves a different way (i.e., >> you can find some documents via shard local queries, whereas others are >> only visible at a global level). >> >> I think that if people want documents to behave that differently, >> enforcing different databases is helpful. It reinforces the point that >> these databases work well for use-cases where partitioning data using the >> shard key makes sense, which is a different method of data modelling than >> having one huge undifferentiated pool. Perhaps there are heretofore >> unthought of optimisations that only make sense if we can make this >> assumption too :) >> >>> >>>> • Query results are restricted to documents with the shard key >> specified. Which makes things harder but leaves the door open for future >> things like shard-splitting without changing result sets. And it seems like >> what one would expect! >>> >>> I agree this is important. It took me a minute to remember the point >> here, which is that a query specifying a shard key needs to filter out >> results from different shard keys that happen to be colocated on the same >> shard. >>> >>> Does the current query functionality still work as it did before in a >> database without shard keys? That is, can I still issue a query without >> specifying a shard key and have it collate a response from the full >> dataset? I think this is worth addressing explicitly. My assumption is that >> it does, although I’m worried that there may be a problematic interaction >> if one tried to use the same physical index to satisfy both a “global” >> query and a query specifying a shard key. >> >> I think this is an interesting question. >> >> To start with, I guess the basic thing is that to efficiently use an index >> you'd imagine that you'd prefix the index's columns with the shard key -- >> at least that's the thing I've been thinking, which likely means cleverer >> options are available :) >> >> My first thought is that the naive approach to filtering documents not >> matching a shard key is just that -- a node hosting a replica of a shard >> does a query on an index as normal and then there's some extra code that >> filters based on ID. Not actually super-awful -- we don't have to actually >> read the document itself for example -- but for any use-case where there >> are many shard keys associated with a given shard it feels like one can do >> better. But as long as the node querying the index is doing it, it feels >> pretty fast. >> >> I would wonder whether some more generally useful work on Mango could help >> reduce the amount of special case code going on: >> >> - Push index selection down to each shard. >> - Allow Mango to use multiple indexes to satisfy a query (even if this is >> simply for AND relationships). >> >> Then for any database with the shard key bit set true, the shards also >> create a JSON index based on the shard key, and we can append an `AND >> shardkey=foo` to the users' Mango selector. As our shard keys are in the >> doc ID, I don't think this is any faster at all. It would be if the shard >> key was more complicated, say a field in the doc, so we didn't have it to >> hand all the time. But it would certainly make the alteration for the shard >> local path much more contained and have very wide utility beyond this case. >> >> For views, I'm less sure there's anything smart you can do that doesn't >> add tonnes of overhead -- like making two indexes per view, one that's >> prefixed with the shard key and one which is not. This approach has all >> sorts of nasty interactions with things like reverse=true I imagine, >> however. >> >> Mike. >>