Re: database design question: concurrent writes

Hans J Schroeder Thu, 13 Dec 2012 10:38:21 -0800

On Dec 13, 2012, at 6:41 PM, Robert Newson <[email protected]> wrote:


> Two databases can have a different number of shards, different numbers
> of replicas of each shard, different locations for those shards, we
> might move shards over time as we add or remove nodes from the
> cluster. I don't see how you can do any of that without a document
> describing the layout.
> 
> B.
> 
> On 13 December 2012 17:30, Benoit Chesneau <[email protected]> wrote:
>> On Thu, Dec 13, 2012 at 5:46 PM, Robert Newson <[email protected]> wrote:
>> 
>>> Views are also sharded.
>>> 
>>> It's common for a node to host multiple shards of the same database,
>>> so we already have this 'concurrent writes' notion, if I've
>>> interpreted it correctly.
>>> 
>>> 
>> Well my question are more related why using a mapping ? And how to keep the
>> backup of one database easy for a user. Possibly without relying on a
>> mapping stored aside.
>> 
>> Other question is  why bigcouch choose that design vs the one I propose.
>> 
>> @Hans since db are shared and views are done / db ,views indexations are
>> also concurrent.
>> 
>> - benoît
>> 
>> 
>> 
>>> On 13 December 2012 16:36, Hans J Schroeder <[email protected]> wrote:
>>>> 
>>>> On Dec 13, 2012, at 5:06 PM, Benoit Chesneau <[email protected]>
>>> wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> 
>>>>> This morning I was back reading a lot of fundamentals about  databases
>>> and
>>>>> such and was asking myself how we could increase the number of
>>> concurrent
>>>>> writes.
>>>>> 
>>>>> These days the theory is that it will be solved by sharding the
>>> databases
>>>>> in multiples database files and merging results of the queries. Since
>>> the
>>>>> databases will be shareded then the writes on the same db will be
>>>>> concurrents. A map of the shards willl be kept aside. All of this
>>> thanks to
>>>>> the introduction of bigcouch.
>>>>> 
>>>>> The question I have is why don't we already do that? Ie balancing datas
>>> on
>>>>> different files on one db? for example the db folder could be
>>>>> 
>>>>> database/XY.couch
>>>>> 
>>>>> where XY are the first letters of an id or content hash or any
>>> consistent
>>>>> hashing method.
>>>>> 
>>>>> I am currently asking myself such question because I am wondering how
>>> will
>>>>> the backup works when couchdb will be used as a single node. How to
>>> backup
>>>>> only one db without having to query for the mapping and such? How to
>>> keep
>>>>> it it simple.
>>>>> 
>>>>> Related to that why did bigcouch used that design? Why mapping shards
>>> in a
>>>>> db database instead of having some kind of natural balancing on the fs
>>> and
>>>>> having a consistent hashing algorithm used to balance on different
>>>>> machines/vms as well ?
>>>>> 
>>>>> 
>>>>> - benoît
>>>> 
>>>> 
>>>> Hi,
>>>> 
>>>> That's like "horizontal partitioning" in conventional databases and I
>>> think its a great idea. Having a writer process for each partition will
>>> make it scale.
>>>> 
>>>> Does Bigcouch have anything for the view files too or are they just
>>> sharding the backing files?
>>>> 
>>>> - Hans
>>> 

Thanks for the clarification about the views. 

Its all about what we want to have. For concurrent writes, a simple shuffling 
like Benoit has described, would be an efficient solution. For configurable 
clusters, a mapping store of some kind is needed.

- Hans

Re: database design question: concurrent writes

Reply via email to