Hi everyone, I want to introduce our open-source project Themis which 
implements cross-row/corss-table transaction on HBase.

Themis follows google's percolator 
algorithm(http://research.google.com/pubs/pub36726.html), which provides 
ACID-compliant transaction and snapshot isolation. The cross-row transaction is 
based on HBase's single-row atomic semantics and doesn't use a central 
transaction server, so that supports linear-scalability.

Themis depends on a timestamp server to provides global strictly incremental 
timestamp to define the order of transactions, which will be used to resolve 
the write-write and read-write conflicts. The timestamp server is lightweight 
and could achieve hight throughput(500, 000 + qps), and Themis will batch 
timestamp requests across transactions in one Rpc, so that it won't become the 
bottleneck of the system even when processing billions of transactions every 
day.

Although Themis could be implemented totally in client-side, we adopt 
coprocessor framework of HBase to achieve higher performance. Themis includes a 
client-side library to provides transaction APIs, such as 
themisPut/themisGet/themisScan/themisDelete, and a coprocessor library loaded 
on regionserver. Therefore, Themis could be used without changing the code and 
logic of HBase.

We have been validating the correctness of Themis for a few months by a 
AccountTransfer simulation program, which concurrently does cross-row 
transactions by transferring money among different accounts(each account is a 
row in HBase) and verifies total money of all accounts doesn't change in the 
simulation. We have also run Themis on our production environment.

We test the performance of Themis and get comparable result as percolator. The 
single-column transaction represents the worst performance case for Themis 
compared with HBase, the result is:
1) For read, the performance of percolator is 90% of HBase;
2) For write, the performance of percolator is 23% of HBase.
The write performance drops a lot because Themis uses two-phase commit protocol 
to achieve ACID of transaction. For multi-row write, we improve the performance 
by paralleling all writes of pre-write phase. For single-row write, we are 
optimizing two-phase commit protocol to achieve better performance and will 
update the result when it is ready. The details of performance result could be 
found in github.

The repository and introduction of Themis include:
1. Themis github: https://github.com/XiaoMi/themis/. The source code, 
performance test result and user guide could be found here.
2. Themis jira : https://issues.apache.org/jira/browse/HBASE-10999
3. Chronos github: https://github.com/XiaoMi/chronos. Chronos is our 
open-source high-availability, high-performance timestamp server to provide 
global strictly incremental timestamp for Themis.

If you find Themis interesting, please leave us comment in the mail, jira or 
github.

Best
cuijianwei

Reply via email to