[
https://issues.apache.org/jira/browse/TEPHRA-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16326493#comment-16326493
]
Andreas Neumann commented on TEPHRA-247:
----------------------------------------
I do see that it could be possible to work around the region split using a post
split hook - but I still don't feel comfortable with the approach. The issue we
are trying to solve is that when the invalid list gets large - and so does the
transaction object - then we encode, transmit and decode this large object with
every get() performed by in this transaction.
A very important case is a small transaction - say a transaction that performs
a single get or scan, followed by a put, and then commits. Today, this requires
sending the transaction only once: for the read operation, and it only gets
sent to one region, or only the regions involved in the scan. The proposed
design requires that we send the transaction to every region when the
transaction starts. That appears to add overhead rather than reducing overhead.
I feel that if we want to reduce overhead, we have multiple angles to look at
this:
* reduce the cost of encoding, transmitting and decoding the tx. This could
involve:
** using a more efficient (faster) or more compact (smaller) codec
** caching the encoded transaction on the client side after it was encoded for
the first time
** caching the decoded the transaction in region servers after it has been
decoded for the first time
* avoid decoding the tx all together, by using a codec that does not require
decoding. That is, instead of binary search in an array of tx ids, some
encoding that allows searching directly on the binary representation.
* avoid transmitting the invalid list, A possibility is to rely on the
existing TransactionStateCache, which has knowledge about the invalid
transactions in the last snapshot. That could allow us to only transmit the
invalid transactions added since the last snapshot.
By the way, there is similar overhead in the communication between Transaction
Manager and the client when the transaction is created. That could be another
area of improvement.\
Thoughts?
> Avoid encoding the transaction multiple times
> ---------------------------------------------
>
> Key: TEPHRA-247
> URL: https://issues.apache.org/jira/browse/TEPHRA-247
> Project: Tephra
> Issue Type: Improvement
> Components: core, manager
> Affects Versions: 0.12.0-incubating
> Reporter: Andreas Neumann
> Assignee: Andreas Neumann
> Priority: Major
> Attachments: design.jpg
>
>
> Currently, the same transaction object is encoded again and again for every
> Get performed in HBase. It would be better to cache the encoded transaction
> for the duration of the transaction and reuse it,
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)