[ https://issues.apache.org/jira/browse/PHOENIX-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205519#comment-15205519 ]
James Taylor commented on PHOENIX-2788: --------------------------------------- I agree with [~ghelmling] that there's quite a bit more needed on the Phoenix side to make transactions pluggable beyond the HBase API changes. The more similarity between the approach that transaction libraries have taken, the more likely it can be pluggable. I think it's possible between Tephra and Omid (both implementations of snapshot isolation), but I don't think it would extend well to the percolator-like approach taken by the XiaoMi folks for Themis. To get a broad idea of how a transaction layer could be made pluggable, you can look at the components of the Tephra architecture. The way Tephra plugged into HBase helped tremendously in being able to integrate it with Phoenix. - TransactionAwareHTable. This is a wrapper on HTable that delegates to the regular HTable, but attaches metadata to operations (used on the server-side) to make them transactional. - Transaction Manager. This doles out transaction IDs and handles conflict detection. It also provides a means of getting the in-flight and invalid transactions IDs. - Transaction Coprocessor. This handles attaching the visibility filter to filter invalid and inflight transactions, setting the cell timestamp to the transactionID, and converting deletes to the appropriate transaction-specific delete markers (see below for more on this). - Transaction Janitor. Handles cleaning up invalid data on flush or compaction. Some key interfaces and classes in Tephra are Transaction, TransactionContext, TransactionClient, and TransactionAware (see https://github.com/caskdata/tephra#client-apis for some good docs). >From the Phoenix requirements standpoint, here are the detailed ways (and >reasons) we leverage the various interfaces and components of Tephra to >provide a reasonable solution. There could be alternate ways of Tephra >implementing these in HBase, I'm sure, but this is from the standpoint of how >they're implemented today in Tephra, Phoenix, and HBase, so hopefully this >gives an idea of the functionality that would need to become pluggable: * *Enabling a client to see their own uncommitted writes*. This typically means that mutations to data (including deletes) are written to HBase, but filtered from scans of other clients until the commit is performed. This implies that you need a way to undo these changes if the commit fails or is manually rolled back. This is where we could use HBASE-11292. The alternative is to have your own family and cell delete markers (which end up just being Puts) so that they can be undone (which is what is done in Tephra). Without HBASE-11292, different transaction libraries would need to agree on what constitutes a delete marker to have a good interop story. There's also be a fair amount of duplication around implementing your own delete markers that'd be duplicated. * *Query all versions of uncommitted data*. This was required for secondary index support, a driving reason for needing transactions, to enable table updates and the corresponding secondary index updates to be transactionally consistent. In order to be able to undo the index updates when a rollback occurs, we needed to be able to see all versions of mutations that were made in that transaction. * *Getting inflight transaction IDs*. This was needed to handle adding a secondary index to a table that's taking writes, as it provided a means of ensuring that no writes to the table are missed when creating the secondary index. Tephra enables this does this by providing a few utility methods that enable read/write fences to be placed. * *Transaction checkpointing*. Common in SQL implementations, there's a command that reads from a table and directly writes to the same table (UPSERT SELECT). In order for a client to still see their own uncommitted data, but not see the writes of that statement while in progress (or you can get into an infinite loop), you need a way of having multiple transaction IDs associated with a single transaction. In this way, you can see uncommitted data, but not see writes occurring for a given statement. * *Cell timestamp that represent transaction ID*. Having transaction IDs represented in the Cell timestamp enables a consistent means of filtering based on transaction ID (a requirement for snapshot isolation). Because HBase only stores millisecond granularity in the Cell timestamp, Tephra has to multiply the timestamp by a million to get enough granularity for unique transaction IDs (and support more than one transaction per millisecond). This is where HBASE-8927 would help. The alternative is that transaction libraries agree on multiplying timestamps by a million. * *Cell timestamp that corresponds to wall clock time*. Not every Phoenix use case would have this requirement, but the use cases at SFDC do. We rely on the Cell timestamp to correspond to (or be derivable to) wall clock time. This also allows TTL to be supported which is important for many use cases (inside and outside SFDC) and enables an existing table to be altered to become transactional. * *Disable conflict detection*. Phoenix allows a table to be declared as immutable. In this case, it's important to be able to turn off conflict detection so that you're not hit with this check at commit time. In our perf testing, there's very little overhead of enabling transactions on immutable tables (a very common use case for Phoenix). * *Cleanup invalid data*. Tephra has the concept of an invalid list which stores all the transaction IDs of failed transactions that were unable to be undone by the client. Though Tephra has a means of manually clearing this list, there's no automated means of pruning this list (it's a tricky bookkeeping problem, as you can only clear it after a major compaction, but the server-side doesn't know when this is the case). This puts a pretty high operational burden on a production system, as allowing this list to grow unbounded has the potential to degrade query performance. This is an item that we're hoping to get a fix from Tephra for in the next release. Here's a few additional nice-to-have items: * *Read-only clients*. Not strictly necessary, but an important optimization. In Phoenix (and in most SQL implementations), you can set a connection as read-only. In this case, you don't need to track transaction IDs you dole out as in flight, since there's no possibility of data being written and thus no need to filter transactions with that ID. Having support for this will help in allowing more simultaneous clients. * *Partitioned transaction manager*. Not strictly necessary, as with some implementations this may not be necessary. However, with a global transaction manager, the question always comes up of this becoming a bottleneck. One way around this is to have multiple transaction managers partitioned in some manner in which transactions would not occur across multiple partitions. With a multi-tenant system, this may be a feasible solution. > Make transactions pluggable in Phoenix > -------------------------------------- > > Key: PHOENIX-2788 > URL: https://issues.apache.org/jira/browse/PHOENIX-2788 > Project: Phoenix > Issue Type: Bug > Reporter: James Taylor > > Given that now there's another transactional library for transactions over > HBase in Omid that will likely be entering the incubator soon, we should > investigate what it'll take to make our transaction support pluggable. Omid > may not be that difficult to plugin, given that its basic approach (snapshot > isolation) is similar to Tephra's (but of course the devil's in the details). -- This message was sent by Atlassian JIRA (v6.3.4#6332)