I would think that if we have to put environ-uuid into the _id field, then we wouldn't need yet-another field to shard on (at least if we put it at the beginning of the field).
John =:-> On Fri, Jul 4, 2014 at 2:24 PM, William Reade <william.re...@canonical.com> wrote: > My expectation is that: > > 1) We certainly need the environment UUID as a separate field for the > shard key. > 2) We *also* need the environment UUID as an _id prefix to keep our > watchers sane. > 2a) If we had separate collections per environment, we wouldn't; but AIUI, > scaling mongo by adding collections tends to end badly (I don't have direct > experience here myself; but it does indeed seem that we'd start consuming > namespaces at a pretty terrifying rate, and I'm inclined to trust those who > have done this and failed.) > 2b) I'd ordinarily dislike the duplication across the _id and uuid fields, > but there's a clear reason for doing so here, so I'm not going to complain. > I *will* continue to complain about documents that duplicate info across > fields in order to save a few runtime microseconds here and there ;). > > If someone with direct experience can chip in reassuringly I *might* be > prepared to back off on the N-collections-per-environment thing, but I'm > certainly not willing to take it so far as to separate the txn logs and > thus discard consistency across environments: I think there will certainly > be references between individual hosted environments and the initial > environment. > > So, in short, I think Tim's (1) is the way to go. But *please* don't > duplicate data that doesn't have to be -- the UUID is fine, the name is > not. If we really end up spending a lot of time extracting names from _id > fields we can cache them in the state documents -- but we don't need > redundant copies in the DB, and we *really* don't need to make our lives > harder by giving our data unnecessary opportunities for inconsistency. > > Cheers > William > > > > On Fri, Jul 4, 2014 at 6:42 AM, John Meinel <j...@arbash-meinel.com> > wrote: > >> According to the mongo docs: >> http://docs.mongodb.org/manual/core/document/#record-documents >> The field name _id is reserved for use as a primary key; its value must >> be unique in the collection, is immutable, and may be of any type other >> than an array. >> >> That makes it sound like we *could* use an object for the _id field and >> do _id = {env_uuid:, name:} >> >> Though I thought the purpose of doing something like that is to allow >> efficient sharding in a multi-environment world. >> >> Looking here: http://docs.mongodb.org/manual/core/sharding-shard-key/ >> The shard key must be indexed (which is just fine for us w/ the primary >> _id field or with any other field on the documents), and "The index on the >> shard key *cannot* be a *multikey index >> <http://docs.mongodb.org/manual/core/index-multikey/#index-type-multikey>".* >> I don't really know what that means in the case of wanting to shard based >> on an object instead of a simple string, but it does sound like it might be >> a problem. >> Anyway, for purposes of being *unique* we may need to put environ uuid in >> there, but for the purposes of sharding we could just put it on another >> field and index that field. >> >> John >> =:-> >> >> >> >> On Fri, Jul 4, 2014 at 5:01 AM, Tim Penhey <tim.pen...@canonical.com> >> wrote: >> >>> Hi folks, >>> >>> Very shortly we are going to start on the work to be able to store >>> multiple environments within a single mongo database. >>> >>> Most of our current entities are stored in the database with their name >>> or id fields serialized to bson as the _id field. >>> >>> As far as I know (and I may be wrong), if you are adding a document to >>> the mongo collection, and you do not specify an _id field, mongo will >>> create a unique value for you. >>> >>> In our new world, things that used to be unique, like machines, >>> services, units etc, are now only unique when paired with the >>> environment id. >>> >>> It seems we have a number of options here. >>> >>> 1. change the _id field to be a "composed" field where it is the >>> concatenation of the environment id and the existing id or name field. >>> If we do take this approach, I strongly recommend having the fields that >>> make up the key be available by themselves elsewhere in the document >>> structure. >>> >>> 2. let mongo create the _id field, and we ensure uniqueness over the >>> pair of values with a unique index. One think I am unsure about with >>> this approach is how we currently do our insertion checks, where we do a >>> "document does not exist" check. We wouldn't be able to do this as a >>> transaction assertion as it can only check for _id values. How fast are >>> the indices updated? Can having a unique index for a document work for >>> us? I'm hoping it can if this is the way to go. >>> >>> 3. use a composite _id field such that the document may start like this: >>> { _id: { env_uuid: "blah", name: "foo"}, ... >>> This gives the benefit of existence checks, and real names for the _id >>> parts. >>> >>> Thoughts? Opinions? Recommendations? >>> >>> BTW, I think that if we can make 3 work, then it is the best approach. >>> >>> Tim >>> >>> -- >>> Juju-dev mailing list >>> Juju-dev@lists.ubuntu.com >>> Modify settings or unsubscribe at: >>> https://lists.ubuntu.com/mailman/listinfo/juju-dev >>> >> >> >> -- >> Juju-dev mailing list >> Juju-dev@lists.ubuntu.com >> Modify settings or unsubscribe at: >> https://lists.ubuntu.com/mailman/listinfo/juju-dev >> >> >
-- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev