Hi Niels,

I'm a solutions engineer at Couchbase, based in the UK.
I'll reach out to you directly so that we can work with you on getting a good 
design that's well suited to the Couchbase architecture.

I just wanted to highlight straight away that although Couchbase provides these 
locking mechanisms which can be used to provide safe concurrent access, they 
are not providing a 'transaction' in the sense you might be familiar with from 
an RDBMS perspective.

Even with the locking in place, updates are atomic on a per document basis only.

Choosing the right data model will be key to making this work well for you, and 
it's something we can look at in conjunction with the team here.

Cheers,
Tom

Solutions Engineer, UK/EMEA
Couchbase





Hello all,

As I am new to Couchbase I've been making some mistakes while trying to migrate 
a system from MongoDB to Couchbase, particularly with regards to the design of 
views/indexes. I hope someone can help.
This is quite a long post, sorry for that; I wanted to present a reasonably 
whole picture of the context, problem and my own considerations thus far.

System context
--------------
The problem domain consists of just two objects: an Account and Transaction. In 
relational terms an Account has 0..N Transactions. The system handles up to a 
thousand of inserts and retrievals per minute and should have the potential to 
grow to a capacity of tens of thousands of inserts and retrievals per minute. I 
do not know exact numbers but my guess is the inserts/retrieval-ratio is fairly 
even (50/50).

Current state
--------------
Currently the system is implemented in Oracle DB and I've been prototyping a 
MongoDB version the last couple of weeks. This prototype is successfull and has 
received positive feedback. However, the company I work for has selected 
Couchbase due to its mobile data synchronization. From what I've read Couchbase 
should be able to scale to a level that would satisfy my use case as well. As a 
result I've been trying to migrate the MongoDB prototype to Couchbase thereby 
limiting the number of database types that we need to support within the 
company.

Document design
--------------
Simplified, a transaction has five interesting properties (I use abbreviations 
to make them better recognizable):
- TX_ID: a unique identifier defined by the client system.
- TX_NR: a transaction number which is is a counter starting from 1 and 
incremented by 1 for each new transaction _per_ account (i.e. the first 
transaction of any new account will have TX_NR = 1).
- TX_MUT: a balance mutation value.
- TX_ACC_ID: the account identifier (see below)
- TX_BAL: the (historical) account balance at the time the transaction was 
processed.

An account has two interesting properties;
- ACC_ID: a unique identifier that is defined by the client system. I selected 
this as the document key.
- ACC_BAL: the balance contains the available funds for the account. In MongoDB 
I cannot do a Transaction insert and Account update atomically within one 
transaction (and MongoDB support no locking out-of-the-box) so in the current 
MongoDB prototype this ACC_BAL property is virtual and in reality retrieved by 
fetching the TX_BAL from the last transaction).

The documents contain quite a bit more data but I don't think that matters for 
the purpose of this discussion.

Problem
--------------
When I tried to apply my MongoDB design to Couchbase I got strong suspicions I 
wasn't following best practices, particularly with regard to views. Views are 
eventual consistent by default, I suspect for performance reasons.
In my use case multiple transactions should be processable in batch, 
asynchronously or synchronously. So concurrency issues are a concern (and have 
caused us plently of problems even in a RDBMS (Oracle DB) due to faulty 
application implementation). Also, the room for data integrity issues is almost 
non-existent; the data should be correct.

Data access patterns
--------------
First scenario: retrieving a (paginated) subset of an account transaction list
In the design of MongoDB I used the TX_ACC_ID reference within the Transaction 
as the sharding key and as a result all queries in normal use cases would go to 
a single shard. In Couchbase I have no control over where my data ends up and 
retrieving a list of transactions (a common use case) I'd need to do a 
multi-get I believe. Given the issues with couchbase views (scatter/gather 
queries, eventual consistent, cumbersome to update as part of automated CI (and 
perhaps not very performant?)) I prefer not to use them. My thought was to make 
document keys predictable and not use TX_ID as part of the object key but 
instead use "ACC_ID + TX_NR" as the Transaction document key. This allows me to 
do a fetch of say 10 records and skipping the first 30 without additional 
lookups.

However, as part of the response I also need to return the total number of 
Transactions within the account. Because I can do locking in Couchbase (I'm 
considering pessimistic locking here) I can maintain a value such 
ACC_LAST_TX_NR within the Account document and update it each time when 
inserting a new transaction.

Second scenario: inserting a transaction
In MongoDB I've limited the insert Transaction to creating a new document and 
that's it. Several indexes make sure the document is available in a strong 
consistent manner. To do the actual insert I lookup the TX_BAL and TX_NR from 
the last known transaction, retrieve the Account, increment TX_NR and insert 
the new transaction. If in the meantime another Transaction was persisted to 
MongoDB a unique index on TX_NR will prevent inserting the new transaction 
(this unique index basically acts as an optimistic lock).
In Couchbase the query "get last transaction for specified account" would 
require a view that is strongly consistent. That is problematic. However, 
because Couchbase supports locking I can update the balance within the Account 
like I used to do in Oracle DB (the MongoDB approach was more of a workaround 
anyway). Inserting a transaction in Couchbase would involve three steps:
1. Retrieve Account and take lock
2. Insert Transaction with TX_NR = ACC_LAST_TX_NR + 1
3. Update the Account with the new ACC_BAL and set ACC_LAST_TX_NR + 1

Questions
--------------
1. When inserting a new transaction, how do I guarantee the TX_ID is unique? 
The TX_ID is determined by the client system and prevents duplicate transaction 
processing. The document key is already used by "ACC_ID + TX_NR" to do easy 
lookups. Couchbase does not support unique indexes, nor do I suspect updating a 
view for this purpose is sufficiently performant.
2. I could not find a resource describing how updates work, and I do know some 
databases handle them quite poorly in certain circumstances (MongoDB reserves 
some additional space, but if you grow your document it needs to be moved). 
Some say you're better of deleting the old document and inserting a new one is 
better in some situations. While MongoDB is insert-only, Couchbase would become 
insert and update heavy. Has this a significant impact on how it will behave?
3. I've been using Mongoose (an ODM) with Node.js for my MongoDB integration. 
For Couchbase I didn't find a mature and well-documented ODM thus far and I am 
considering rolling my own minimalistic ODM untill I can replace it with 
something else. Given the future with N1QL I don't want to "buy in" into any 
ODM that may be ill supported (a common theme for Node.js modules... *sigh*). 
If anyone has any tips on this, much appreciated.

One solution I was considering for my first problem was maintaining a map of 
"TX_NR => TX_ID" within the Account and effectively maintain my own index in 
each Account. The proposed insert TX scenario retrieves the Account in step 
(1); checking for duplicates would be a breeze and in step (3) I can update the 
Account index. This approach does introduce some scalability concers. currently 
Accounts may have several hundreds of transactions and may even grow to 
thousands or even tens of thousands of transactions in the future. Putting all 
those values in a single Account document doesn't feel right (lots of data 
transfer, processing, and potentially huge documents).

Sorry for the long post. Currently the prototype in MongoDB works and performs 
well, but having to support just one type of database would be much simpler 
(and compared to Couchbase the MongDB infrastructure is quite complex).

As I am new to Couchbase I might be making quite a few wrong assumptions about 
Couchbase.

I hope someone can help.
Regards,
Niels
[cid:C07986C1-89B5-4638-A696-492DE61FC38E]
Always Scalable, Always On NoSQL

________________________________

        Tom Green
Solutions Engineer
Phone: +447590 57 1818
Skype: tomgreen000
Email: [email protected]


-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

<<inline: DCA66614-471A-4275-A8CA-9691E1DE059C[5].png>>

Reply via email to