Micael Capitão created TEPHRA-232:
-------------------------------------

             Summary: Transaction metadata sent on each put is too big
                 Key: TEPHRA-232
                 URL: https://issues.apache.org/jira/browse/TEPHRA-232
             Project: Tephra
          Issue Type: Bug
         Environment: HBase 1.2.0-cdh5.11
CentOS 7.3
4x machines
Bandwidth between machines 1Gbps
            Reporter: Micael Capitão
            Assignee: Poorna Chandra
            Priority: Critical


I've been testing Tephra 0.11.0 (and more recently 0.12.0) for a project that 
may need transactions on top of HBase and I find it's performance, for 
instance, for a bulk load, very poor. Let's not discuss why am I doing a bulk 
load with transactions.

In my use case I am generating batches of ~10000 elements and inserting them 
with the *put(List<Put> puts)* method. There is no concurrent writers or 
readers.
If I do the put without transactions it takes ~0.5s. If I use the 
*TransactionAwareHTable* it takes ~12s.
In both cases the network bandwidth is fully utilised.

I've tracked down the performance killer to be the 
*addToOperation(OperationWithAttributes op, Transaction tx)* on the 
TransactionAwareHTable.

I've created a TransactionAwareHTableFix with the *addToOperation(txPut, tx)* 
commented, and used it in my code, and each batch started to take ~0.5s.

Then I checked what was being done inside the *addToOperation* method and 
verified that the issue has something to do with the serialization of the 
Transaction object. The serialized Transaction object has 104171 bytes of 
length. Considering that it happens for each put, basically my batch of ~10000 
elements has ~970MB of serialized transactions, which explains the 12s vs 5s to 
be processed at the same time that the network is exhausted.

It seems that the transactions' metadata, despite being sent to HBase, is not 
stored so the final table size, with or without transactions, is the same.

Is this metadata encoding and send behaviour expected? This is making Tephra 
unusable, at least with only 1Gbps bandwidth.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to