Hi Niels, I'm a solutions engineer at Couchbase, based in the UK. I'll reach out to you directly so that we can work with you on getting a good design that's well suited to the Couchbase architecture.
I just wanted to highlight straight away that although Couchbase provides these
locking mechanisms which can be used to provide safe concurrent access, they
are not providing a 'transaction' in the sense you might be familiar with from
an RDBMS perspective.
Even with the locking in place, updates are atomic on a per document basis only.
Choosing the right data model will be key to making this work well for you, and
it's something we can look at in conjunction with the team here.
Cheers,
Tom
Solutions Engineer, UK/EMEA
Couchbase
Hello all,
As I am new to Couchbase I've been making some mistakes while trying to migrate
a system from MongoDB to Couchbase, particularly with regards to the design of
views/indexes. I hope someone can help.
This is quite a long post, sorry for that; I wanted to present a reasonably
whole picture of the context, problem and my own considerations thus far.
System context
--------------
The problem domain consists of just two objects: an Account and Transaction. In
relational terms an Account has 0..N Transactions. The system handles up to a
thousand of inserts and retrievals per minute and should have the potential to
grow to a capacity of tens of thousands of inserts and retrievals per minute. I
do not know exact numbers but my guess is the inserts/retrieval-ratio is fairly
even (50/50).
Current state
--------------
Currently the system is implemented in Oracle DB and I've been prototyping a
MongoDB version the last couple of weeks. This prototype is successfull and has
received positive feedback. However, the company I work for has selected
Couchbase due to its mobile data synchronization. From what I've read Couchbase
should be able to scale to a level that would satisfy my use case as well. As a
result I've been trying to migrate the MongoDB prototype to Couchbase thereby
limiting the number of database types that we need to support within the
company.
Document design
--------------
Simplified, a transaction has five interesting properties (I use abbreviations
to make them better recognizable):
- TX_ID: a unique identifier defined by the client system.
- TX_NR: a transaction number which is is a counter starting from 1 and
incremented by 1 for each new transaction _per_ account (i.e. the first
transaction of any new account will have TX_NR = 1).
- TX_MUT: a balance mutation value.
- TX_ACC_ID: the account identifier (see below)
- TX_BAL: the (historical) account balance at the time the transaction was
processed.
An account has two interesting properties;
- ACC_ID: a unique identifier that is defined by the client system. I selected
this as the document key.
- ACC_BAL: the balance contains the available funds for the account. In MongoDB
I cannot do a Transaction insert and Account update atomically within one
transaction (and MongoDB support no locking out-of-the-box) so in the current
MongoDB prototype this ACC_BAL property is virtual and in reality retrieved by
fetching the TX_BAL from the last transaction).
The documents contain quite a bit more data but I don't think that matters for
the purpose of this discussion.
Problem
--------------
When I tried to apply my MongoDB design to Couchbase I got strong suspicions I
wasn't following best practices, particularly with regard to views. Views are
eventual consistent by default, I suspect for performance reasons.
In my use case multiple transactions should be processable in batch,
asynchronously or synchronously. So concurrency issues are a concern (and have
caused us plently of problems even in a RDBMS (Oracle DB) due to faulty
application implementation). Also, the room for data integrity issues is almost
non-existent; the data should be correct.
Data access patterns
--------------
First scenario: retrieving a (paginated) subset of an account transaction list
In the design of MongoDB I used the TX_ACC_ID reference within the Transaction
as the sharding key and as a result all queries in normal use cases would go to
a single shard. In Couchbase I have no control over where my data ends up and
retrieving a list of transactions (a common use case) I'd need to do a
multi-get I believe. Given the issues with couchbase views (scatter/gather
queries, eventual consistent, cumbersome to update as part of automated CI (and
perhaps not very performant?)) I prefer not to use them. My thought was to make
document keys predictable and not use TX_ID as part of the object key but
instead use "ACC_ID + TX_NR" as the Transaction document key. This allows me to
do a fetch of say 10 records and skipping the first 30 without additional
lookups.
However, as part of the response I also need to return the total number of
Transactions within the account. Because I can do locking in Couchbase (I'm
considering pessimistic locking here) I can maintain a value such
ACC_LAST_TX_NR within the Account document and update it each time when
inserting a new transaction.
Second scenario: inserting a transaction
In MongoDB I've limited the insert Transaction to creating a new document and
that's it. Several indexes make sure the document is available in a strong
consistent manner. To do the actual insert I lookup the TX_BAL and TX_NR from
the last known transaction, retrieve the Account, increment TX_NR and insert
the new transaction. If in the meantime another Transaction was persisted to
MongoDB a unique index on TX_NR will prevent inserting the new transaction
(this unique index basically acts as an optimistic lock).
In Couchbase the query "get last transaction for specified account" would
require a view that is strongly consistent. That is problematic. However,
because Couchbase supports locking I can update the balance within the Account
like I used to do in Oracle DB (the MongoDB approach was more of a workaround
anyway). Inserting a transaction in Couchbase would involve three steps:
1. Retrieve Account and take lock
2. Insert Transaction with TX_NR = ACC_LAST_TX_NR + 1
3. Update the Account with the new ACC_BAL and set ACC_LAST_TX_NR + 1
Questions
--------------
1. When inserting a new transaction, how do I guarantee the TX_ID is unique?
The TX_ID is determined by the client system and prevents duplicate transaction
processing. The document key is already used by "ACC_ID + TX_NR" to do easy
lookups. Couchbase does not support unique indexes, nor do I suspect updating a
view for this purpose is sufficiently performant.
2. I could not find a resource describing how updates work, and I do know some
databases handle them quite poorly in certain circumstances (MongoDB reserves
some additional space, but if you grow your document it needs to be moved).
Some say you're better of deleting the old document and inserting a new one is
better in some situations. While MongoDB is insert-only, Couchbase would become
insert and update heavy. Has this a significant impact on how it will behave?
3. I've been using Mongoose (an ODM) with Node.js for my MongoDB integration.
For Couchbase I didn't find a mature and well-documented ODM thus far and I am
considering rolling my own minimalistic ODM untill I can replace it with
something else. Given the future with N1QL I don't want to "buy in" into any
ODM that may be ill supported (a common theme for Node.js modules... *sigh*).
If anyone has any tips on this, much appreciated.
One solution I was considering for my first problem was maintaining a map of
"TX_NR => TX_ID" within the Account and effectively maintain my own index in
each Account. The proposed insert TX scenario retrieves the Account in step
(1); checking for duplicates would be a breeze and in step (3) I can update the
Account index. This approach does introduce some scalability concers. currently
Accounts may have several hundreds of transactions and may even grow to
thousands or even tens of thousands of transactions in the future. Putting all
those values in a single Account document doesn't feel right (lots of data
transfer, processing, and potentially huge documents).
Sorry for the long post. Currently the prototype in MongoDB works and performs
well, but having to support just one type of database would be much simpler
(and compared to Couchbase the MongDB infrastructure is quite complex).
As I am new to Couchbase I might be making quite a few wrong assumptions about
Couchbase.
I hope someone can help.
Regards,
Niels
[cid:C07986C1-89B5-4638-A696-492DE61FC38E]
Always Scalable, Always On NoSQL
________________________________
Tom Green
Solutions Engineer
Phone: +447590 57 1818
Skype: tomgreen000
Email: [email protected]
--
You received this message because you are subscribed to the Google Groups
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.
<<inline: DCA66614-471A-4275-A8CA-9691E1DE059C[5].png>>
