I'm very keen on this. Thanks Menno (and Tim); unless anyone comes up with substantial objections, let's go with this.
Cheers William On Wed, Oct 1, 2014 at 6:25 AM, Menno Smits <[email protected]> wrote: > Team Onyx has been busy preparing for multi-environment state server > support. One piece of this is updating almost all of Juju's collections to > include the environment UUID in document identifiers so that data for > multiple environments can co-exist in the same collection even when they > otherwise have same identifier (machine id, service name, unit name etc). > > Based on discussions on juju-dev a while back[1] we have started this doing > this by prepending the environment UUID to the _id field and adding extra > fields which provide the environment UUID and old _id value separately for > easier querying and handling. > > So far, services and units have been migrated. Where previously a service > document looked like this: > > type serviceDoc struct { > Name string `bson:"_id"` > Series string > ... > > it nows looks like this: > > type serviceDoc struct { > DocID string `bson:"_id"` // "<env uuid>:wordpress/0" > Name string `bson:"name"` // "wordpress/0" > EnvUUID string `bson:"env-uuid"` // "<env uuid>" > Series string > ... > > Unit documents have undergone a similar transformation. > > This approach works but has a few downsides: > > it's possible for the local id ("Name" in this case) and EnvUUID fields to > become out of sync with the corresponding values the make up the _id. If > that ever happens very bad things could occur. > it somewhat unnecessarily increases the document size, requiring that we > effectively store some values twice > it requires slightly awkward transformations between UUID prefixed and > unprefixed IDs throughout the code > > MongoDB allows the _id field to be a subdocument so Tim asked me to > experiment with this to see if it might be a cleaner way to approach the > multi-environment conversion before we update any more collections. The code > for these experiments can be found here: > https://gist.github.com/mjs/2959bb3e90a8d4e7db50 (I've included the output > as a comment on the gist). > > What I've found suggests that using a subdocument for the _id is a better > way forward. This approach means that each field value is only stored once > so there's no chance of the document key being out of sync with other fields > and there's no unnecessary redundancy in the amount of data being stored. > The fields in the _id subdocument are easy to access individually and can be > queried separately if required. It is also possible to create indexes on > specific fields in the _id subdocument if necessary for performance reasons. > > Using this approach, a service document would end up looking something like > this: > > type serviceDoc struct { > ID serviceId `bson:"_id"` > Series string > ... > } > > type serviceId struct { > EnvUUID string `bson:"env-uuid"` > Name string > } > > There was some concern in the original email thread about whether > subdocument style _id fields would work with sharding. My research and > experiments suggest that there is no issue here. There are a few types of > indexes that can't be used with sharding, primarily "multikey" indexes, but > I can't see us using these for _id values. A multikey index is used by > MongoDB when a field used as part of an index is an array - it's highly > unlikely that we're going to use arrays in _id fields. > > Hashed indexes are a good basis for well-balanced shards according to the > MongoDB docs so I wanted to be sure that it's OK to create a hashed index > for subdocument style fields. It turns out there's no issue here (see > TestHashedIndex in the gist). > > Using subdocuments for _id fields is not going to prevent us from using > MongoDB's sharding features in the future if we need to. > > Apart from having to rework the changes already made to the services and > units collections[2], I don't see any downsides to this approach. Can anyone > think of something I might be overlooking? > > - Menno > > > [1] - subject was "RFC: mongo "_id" fields in the multi-environment juju > server world" > > [2] - this work will have to be done before 1.21 has a stable release > because the units and services changes have already landed. > > > > -- > Juju-dev mailing list > [email protected] > Modify settings or unsubscribe at: > https://lists.ubuntu.com/mailman/listinfo/juju-dev > -- Juju-dev mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
