[jira] [Commented] (PHOENIX-2788) Make transactions pluggable in Phoenix

James Taylor (JIRA) Mon, 21 Mar 2016 18:17:41 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205519#comment-15205519
 ]


James Taylor commented on PHOENIX-2788:
---------------------------------------

I agree with [~ghelmling] that there's quite a bit more needed on the Phoenix 
side to make transactions pluggable beyond the HBase API changes. The more 
similarity between the approach that transaction libraries have taken, the more 
likely it can be pluggable. I think it's possible between Tephra and Omid (both 
implementations of snapshot isolation), but I don't think it would extend well 
to the percolator-like approach taken by the XiaoMi folks for Themis.

To get a broad idea of how a transaction layer could be made pluggable, you can 
look at the components of the Tephra architecture. The way Tephra plugged into 
HBase helped tremendously in being able to integrate it with Phoenix.
- TransactionAwareHTable. This is a wrapper on HTable that delegates to the 
regular HTable, but attaches metadata to operations (used on the server-side) 
to make them transactional.
- Transaction Manager. This doles out transaction IDs and handles conflict 
detection. It also provides a means of getting the in-flight and invalid 
transactions IDs.
- Transaction Coprocessor. This handles attaching the visibility filter to 
filter invalid and inflight transactions, setting the cell timestamp to the 
transactionID, and converting deletes to the appropriate transaction-specific 
delete markers (see below for more on this).
- Transaction Janitor. Handles cleaning up invalid data on flush or compaction.

Some key interfaces and classes in Tephra are Transaction, TransactionContext, 
TransactionClient, and TransactionAware (see 
https://github.com/caskdata/tephra#client-apis for some good docs).

>From the Phoenix requirements standpoint, here are the detailed ways (and 
>reasons) we leverage the various interfaces and components of Tephra to 
>provide a reasonable solution. There could be alternate ways of Tephra 
>implementing these in HBase, I'm sure, but this is from the standpoint of how 
>they're implemented today in Tephra, Phoenix, and HBase, so hopefully this 
>gives an idea of the functionality that would need to become pluggable:
* *Enabling a client to see their own uncommitted writes*. This typically means 
that mutations to data (including deletes) are written to HBase, but filtered 
from scans of other clients until the commit is performed. This implies that 
you need a way to undo these changes if the commit fails or is manually rolled 
back. This is where we could use HBASE-11292. The alternative is to have your 
own family and cell delete markers (which end up just being Puts) so that they 
can be undone (which is what is done in Tephra). Without HBASE-11292, different 
transaction libraries would need to agree on what constitutes a delete marker 
to have a good interop story. There's also be a fair amount of duplication 
around implementing your own delete markers that'd be duplicated.
* *Query all versions of uncommitted data*. This was required for secondary 
index support, a driving reason for needing transactions, to enable table 
updates and the corresponding secondary index updates to be transactionally 
consistent. In order to be able to undo the index updates when a rollback 
occurs, we needed to be able to see all versions of mutations that were made in 
that transaction.
* *Getting inflight transaction IDs*. This was needed to handle adding a 
secondary index to a table that's taking writes, as it provided a means of 
ensuring that no writes to the table are missed when creating the secondary 
index. Tephra enables this does this by providing a few utility methods that 
enable read/write fences to be placed.
* *Transaction checkpointing*. Common in SQL implementations, there's a command 
that reads from a table and directly writes to the same table (UPSERT SELECT). 
In order for a client to still see their own uncommitted data, but not see the 
writes of that statement while in progress (or you can get into an infinite 
loop), you need a way of having multiple transaction IDs associated with a 
single transaction. In this way, you can see uncommitted data, but not see 
writes occurring for a given statement.
* *Cell timestamp that represent transaction ID*. Having transaction IDs 
represented in the Cell timestamp enables a consistent means of filtering based 
on transaction ID (a requirement for snapshot isolation). Because HBase only 
stores millisecond granularity in the Cell timestamp, Tephra has to multiply 
the timestamp by a million to get enough granularity for unique transaction IDs 
(and support more than one transaction per millisecond). This is where 
HBASE-8927 would help. The alternative is that transaction libraries agree on 
multiplying timestamps by a million.
* *Cell timestamp that corresponds to wall clock time*. Not every Phoenix use 
case would have this requirement, but the use cases at SFDC do. We rely on the 
Cell timestamp to correspond to (or be derivable to) wall clock time. This also 
allows TTL to be supported which is important for many use cases (inside and 
outside SFDC) and enables an existing table to be altered to become 
transactional.
* *Disable conflict detection*. Phoenix allows a table to be declared as 
immutable. In this case, it's important to be able to turn off conflict 
detection so that you're not hit with this check at commit time. In our perf 
testing, there's very little overhead of enabling transactions on immutable 
tables (a very common use case for Phoenix).
* *Cleanup invalid data*. Tephra has the concept of an invalid list which 
stores all the transaction IDs of failed transactions that were unable to be 
undone by the client. Though Tephra has a means of manually clearing this list, 
there's no automated means of pruning this list (it's a tricky bookkeeping 
problem, as you can only clear it after a major compaction, but the server-side 
doesn't know when this is the case). This puts a pretty high operational burden 
on a production system, as allowing this list to grow unbounded has the 
potential to degrade query performance. This is an item that we're hoping to 
get a fix from Tephra for in the next release.

Here's a few additional nice-to-have items:
* *Read-only clients*. Not strictly necessary, but an important optimization. 
In Phoenix (and in most SQL implementations), you can set a connection as 
read-only. In this case, you don't need to track transaction IDs you dole out 
as in flight, since there's no possibility of data being written and thus no 
need to filter transactions with that ID. Having support for this will help in 
allowing more simultaneous clients.
* *Partitioned transaction manager*. Not strictly necessary, as with some 
implementations this may not be necessary. However, with a global transaction 
manager, the question always comes up of this becoming a bottleneck. One way 
around this is to have multiple transaction managers partitioned in some manner 
in which transactions would not occur across multiple partitions. With a 
multi-tenant system, this may be a feasible solution.

> Make transactions pluggable in Phoenix
> --------------------------------------
>
>                 Key: PHOENIX-2788
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2788
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> Given that now there's another transactional library for transactions over 
> HBase in Omid that will likely be entering the incubator soon, we should 
> investigate what it'll take to make our transaction support pluggable. Omid 
> may not be that difficult to plugin, given that its basic approach (snapshot 
> isolation) is similar to Tephra's (but of course the devil's in the details).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2788) Make transactions pluggable in Phoenix

Reply via email to