Hello all,

As I am new to Couchbase I've been making some mistakes while trying to 
migrate a system from MongoDB to Couchbase, particularly with regards to 
the design of views/indexes. I hope someone can help.
This is quite a long post, sorry for that; I wanted to present a reasonably 
whole picture of the context, problem and my own considerations thus far.

System context
--------------
The problem domain consists of just two objects: an Account and 
Transaction. In relational terms an Account has 0..N Transactions. The 
system handles up to a thousand of inserts and retrievals per minute and 
should have the potential to grow to a capacity of tens of thousands of 
inserts and retrievals per minute. I do not know exact numbers but my guess 
is the inserts/retrieval-ratio is fairly even (50/50).

Current state
--------------
Currently the system is implemented in Oracle DB and I've been prototyping 
a MongoDB version the last couple of weeks. This prototype is successfull 
and has received positive feedback. However, the company I work for has 
selected Couchbase due to its mobile data synchronization. From what I've 
read Couchbase should be able to scale to a level that would satisfy my use 
case as well. As a result I've been trying to migrate the MongoDB prototype 
to Couchbase thereby limiting the number of database types that we need to 
support within the company.

Document design
--------------
Simplified, a transaction has five interesting properties (I use 
abbreviations to make them better recognizable):
- TX_ID: a unique identifier defined by the client system. 
- TX_NR: a transaction number which is is a counter starting from 1 and 
incremented by 1 for each new transaction _per_ account (i.e. the first 
transaction of any new account will have TX_NR = 1).
- TX_MUT: a balance mutation value.
- TX_ACC_ID: the account identifier (see below)
- TX_BAL: the (historical) account balance at the time the transaction was 
processed.

An account has two interesting properties;
- ACC_ID: a unique identifier that is defined by the client system. I 
selected this as the document key.
- ACC_BAL: the balance contains the available funds for the account. In 
MongoDB I cannot do a Transaction insert and Account update atomically 
within one transaction (and MongoDB support no locking out-of-the-box) so 
in the current MongoDB prototype this ACC_BAL property is virtual and in 
reality retrieved by fetching the TX_BAL from the last transaction).

The documents contain quite a bit more data but I don't think that matters 
for the purpose of this discussion.

Problem
--------------
When I tried to apply my MongoDB design to Couchbase I got strong 
suspicions I wasn't following best practices, particularly with regard to 
views. Views are eventual consistent by default, I suspect for performance 
reasons. 
In my use case multiple transactions should be processable in batch, 
asynchronously or synchronously. So concurrency issues are a concern (and 
have caused us plently of problems even in a RDBMS (Oracle DB) due to 
faulty application implementation). Also, the room for data integrity 
issues is almost non-existent; the data should be correct.

Data access patterns
--------------
First scenario: retrieving a (paginated) subset of an account transaction 
list
In the design of MongoDB I used the TX_ACC_ID reference within the 
Transaction as the sharding key and as a result all queries in normal use 
cases would go to a single shard. In Couchbase I have no control over where 
my data ends up and retrieving a list of transactions (a common use case) 
I'd need to do a multi-get I believe. Given the issues with couchbase views 
(scatter/gather queries, eventual consistent, cumbersome to update as part 
of automated CI (and perhaps not very performant?)) I prefer not to use 
them. My thought was to make document keys predictable and not use TX_ID as 
part of the object key but instead use "ACC_ID + TX_NR" as the Transaction 
document key. This allows me to do a fetch of say 10 records and skipping 
the first 30 without additional lookups.

However, as part of the response I also need to return the total number of 
Transactions within the account. Because I can do locking in Couchbase (I'm 
considering pessimistic locking here) I can maintain a value such 
ACC_LAST_TX_NR within the Account document and update it each time when 
inserting a new transaction.

Second scenario: inserting a transaction
In MongoDB I've limited the insert Transaction to creating a new document 
and that's it. Several indexes make sure the document is available in a 
strong consistent manner. To do the actual insert I lookup the TX_BAL and 
TX_NR from the last known transaction, retrieve the Account, increment 
TX_NR and insert the new transaction. If in the meantime another 
Transaction was persisted to MongoDB a unique index on TX_NR will prevent 
inserting the new transaction (this unique index basically acts as an 
optimistic lock).
In Couchbase the query "get last transaction for specified account" would 
require a view that is strongly consistent. That is problematic. However, 
because Couchbase supports locking I can update the balance within the 
Account like I used to do in Oracle DB (the MongoDB approach was more of a 
workaround anyway). Inserting a transaction in Couchbase would involve 
three steps:
1. Retrieve Account and take lock
2. Insert Transaction with TX_NR = ACC_LAST_TX_NR + 1
3. Update the Account with the new ACC_BAL and set ACC_LAST_TX_NR + 1

Questions
--------------
1. When inserting a new transaction, how do I guarantee the TX_ID is 
unique? The TX_ID is determined by the client system and prevents duplicate 
transaction processing. The document key is already used by "ACC_ID + 
TX_NR" to do easy lookups. Couchbase does not support unique indexes, nor 
do I suspect updating a view for this purpose is sufficiently performant.
2. I could not find a resource describing how updates work, and I do know 
some databases handle them quite poorly in certain circumstances (MongoDB 
reserves some additional space, but if you grow your document it needs to 
be moved). Some say you're better of deleting the old document and 
inserting a new one is better in some situations. While MongoDB is 
insert-only, Couchbase would become insert and update heavy. Has this a 
significant impact on how it will behave?
3. I've been using Mongoose (an ODM) with Node.js for my MongoDB 
integration. For Couchbase I didn't find a mature and well-documented ODM 
thus far and I am considering rolling my own minimalistic ODM untill I can 
replace it with something else. Given the future with N1QL I don't want to 
"buy in" into any ODM that may be ill supported (a common theme for Node.js 
modules... *sigh*). If anyone has any tips on this, much appreciated.

One solution I was considering for my first problem was maintaining a map 
of "TX_NR => TX_ID" within the Account and effectively maintain my own 
index in each Account. The proposed insert TX scenario retrieves the 
Account in step (1); checking for duplicates would be a breeze and in step 
(3) I can update the Account index. This approach does introduce some 
scalability concers. currently Accounts may have several hundreds of 
transactions and may even grow to thousands or even tens of thousands of 
transactions in the future. Putting all those values in a single Account 
document doesn't feel right (lots of data transfer, processing, and 
potentially huge documents).

Sorry for the long post. Currently the prototype in MongoDB works and 
performs well, but having to support just one type of database would be 
much simpler (and compared to Couchbase the MongDB infrastructure is quite 
complex).

As I am new to Couchbase I might be making quite a few wrong assumptions 
about Couchbase. 

I hope someone can help.
Regards,
Niels

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to