On Dec 13, 2012, at 6:41 PM, Robert Newson <[email protected]> wrote:
> Two databases can have a different number of shards, different numbers > of replicas of each shard, different locations for those shards, we > might move shards over time as we add or remove nodes from the > cluster. I don't see how you can do any of that without a document > describing the layout. > > B. > > On 13 December 2012 17:30, Benoit Chesneau <[email protected]> wrote: >> On Thu, Dec 13, 2012 at 5:46 PM, Robert Newson <[email protected]> wrote: >> >>> Views are also sharded. >>> >>> It's common for a node to host multiple shards of the same database, >>> so we already have this 'concurrent writes' notion, if I've >>> interpreted it correctly. >>> >>> >> Well my question are more related why using a mapping ? And how to keep the >> backup of one database easy for a user. Possibly without relying on a >> mapping stored aside. >> >> Other question is why bigcouch choose that design vs the one I propose. >> >> @Hans since db are shared and views are done / db ,views indexations are >> also concurrent. >> >> - benoît >> >> >> >>> On 13 December 2012 16:36, Hans J Schroeder <[email protected]> wrote: >>>> >>>> On Dec 13, 2012, at 5:06 PM, Benoit Chesneau <[email protected]> >>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> >>>>> This morning I was back reading a lot of fundamentals about databases >>> and >>>>> such and was asking myself how we could increase the number of >>> concurrent >>>>> writes. >>>>> >>>>> These days the theory is that it will be solved by sharding the >>> databases >>>>> in multiples database files and merging results of the queries. Since >>> the >>>>> databases will be shareded then the writes on the same db will be >>>>> concurrents. A map of the shards willl be kept aside. All of this >>> thanks to >>>>> the introduction of bigcouch. >>>>> >>>>> The question I have is why don't we already do that? Ie balancing datas >>> on >>>>> different files on one db? for example the db folder could be >>>>> >>>>> database/XY.couch >>>>> >>>>> where XY are the first letters of an id or content hash or any >>> consistent >>>>> hashing method. >>>>> >>>>> I am currently asking myself such question because I am wondering how >>> will >>>>> the backup works when couchdb will be used as a single node. How to >>> backup >>>>> only one db without having to query for the mapping and such? How to >>> keep >>>>> it it simple. >>>>> >>>>> Related to that why did bigcouch used that design? Why mapping shards >>> in a >>>>> db database instead of having some kind of natural balancing on the fs >>> and >>>>> having a consistent hashing algorithm used to balance on different >>>>> machines/vms as well ? >>>>> >>>>> >>>>> - benoît >>>> >>>> >>>> Hi, >>>> >>>> That's like "horizontal partitioning" in conventional databases and I >>> think its a great idea. Having a writer process for each partition will >>> make it scale. >>>> >>>> Does Bigcouch have anything for the view files too or are they just >>> sharding the backing files? >>>> >>>> - Hans >>> Thanks for the clarification about the views. Its all about what we want to have. For concurrent writes, a simple shuffling like Benoit has described, would be an efficient solution. For configurable clusters, a mapping store of some kind is needed. - Hans
