[
https://issues.apache.org/jira/browse/PHOENIX-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205519#comment-15205519
]
James Taylor commented on PHOENIX-2788:
---------------------------------------
I agree with [~ghelmling] that there's quite a bit more needed on the Phoenix
side to make transactions pluggable beyond the HBase API changes. The more
similarity between the approach that transaction libraries have taken, the more
likely it can be pluggable. I think it's possible between Tephra and Omid (both
implementations of snapshot isolation), but I don't think it would extend well
to the percolator-like approach taken by the XiaoMi folks for Themis.
To get a broad idea of how a transaction layer could be made pluggable, you can
look at the components of the Tephra architecture. The way Tephra plugged into
HBase helped tremendously in being able to integrate it with Phoenix.
- TransactionAwareHTable. This is a wrapper on HTable that delegates to the
regular HTable, but attaches metadata to operations (used on the server-side)
to make them transactional.
- Transaction Manager. This doles out transaction IDs and handles conflict
detection. It also provides a means of getting the in-flight and invalid
transactions IDs.
- Transaction Coprocessor. This handles attaching the visibility filter to
filter invalid and inflight transactions, setting the cell timestamp to the
transactionID, and converting deletes to the appropriate transaction-specific
delete markers (see below for more on this).
- Transaction Janitor. Handles cleaning up invalid data on flush or compaction.
Some key interfaces and classes in Tephra are Transaction, TransactionContext,
TransactionClient, and TransactionAware (see
https://github.com/caskdata/tephra#client-apis for some good docs).
>From the Phoenix requirements standpoint, here are the detailed ways (and
>reasons) we leverage the various interfaces and components of Tephra to
>provide a reasonable solution. There could be alternate ways of Tephra
>implementing these in HBase, I'm sure, but this is from the standpoint of how
>they're implemented today in Tephra, Phoenix, and HBase, so hopefully this
>gives an idea of the functionality that would need to become pluggable:
* *Enabling a client to see their own uncommitted writes*. This typically means
that mutations to data (including deletes) are written to HBase, but filtered
from scans of other clients until the commit is performed. This implies that
you need a way to undo these changes if the commit fails or is manually rolled
back. This is where we could use HBASE-11292. The alternative is to have your
own family and cell delete markers (which end up just being Puts) so that they
can be undone (which is what is done in Tephra). Without HBASE-11292, different
transaction libraries would need to agree on what constitutes a delete marker
to have a good interop story. There's also be a fair amount of duplication
around implementing your own delete markers that'd be duplicated.
* *Query all versions of uncommitted data*. This was required for secondary
index support, a driving reason for needing transactions, to enable table
updates and the corresponding secondary index updates to be transactionally
consistent. In order to be able to undo the index updates when a rollback
occurs, we needed to be able to see all versions of mutations that were made in
that transaction.
* *Getting inflight transaction IDs*. This was needed to handle adding a
secondary index to a table that's taking writes, as it provided a means of
ensuring that no writes to the table are missed when creating the secondary
index. Tephra enables this does this by providing a few utility methods that
enable read/write fences to be placed.
* *Transaction checkpointing*. Common in SQL implementations, there's a command
that reads from a table and directly writes to the same table (UPSERT SELECT).
In order for a client to still see their own uncommitted data, but not see the
writes of that statement while in progress (or you can get into an infinite
loop), you need a way of having multiple transaction IDs associated with a
single transaction. In this way, you can see uncommitted data, but not see
writes occurring for a given statement.
* *Cell timestamp that represent transaction ID*. Having transaction IDs
represented in the Cell timestamp enables a consistent means of filtering based
on transaction ID (a requirement for snapshot isolation). Because HBase only
stores millisecond granularity in the Cell timestamp, Tephra has to multiply
the timestamp by a million to get enough granularity for unique transaction IDs
(and support more than one transaction per millisecond). This is where
HBASE-8927 would help. The alternative is that transaction libraries agree on
multiplying timestamps by a million.
* *Cell timestamp that corresponds to wall clock time*. Not every Phoenix use
case would have this requirement, but the use cases at SFDC do. We rely on the
Cell timestamp to correspond to (or be derivable to) wall clock time. This also
allows TTL to be supported which is important for many use cases (inside and
outside SFDC) and enables an existing table to be altered to become
transactional.
* *Disable conflict detection*. Phoenix allows a table to be declared as
immutable. In this case, it's important to be able to turn off conflict
detection so that you're not hit with this check at commit time. In our perf
testing, there's very little overhead of enabling transactions on immutable
tables (a very common use case for Phoenix).
* *Cleanup invalid data*. Tephra has the concept of an invalid list which
stores all the transaction IDs of failed transactions that were unable to be
undone by the client. Though Tephra has a means of manually clearing this list,
there's no automated means of pruning this list (it's a tricky bookkeeping
problem, as you can only clear it after a major compaction, but the server-side
doesn't know when this is the case). This puts a pretty high operational burden
on a production system, as allowing this list to grow unbounded has the
potential to degrade query performance. This is an item that we're hoping to
get a fix from Tephra for in the next release.
Here's a few additional nice-to-have items:
* *Read-only clients*. Not strictly necessary, but an important optimization.
In Phoenix (and in most SQL implementations), you can set a connection as
read-only. In this case, you don't need to track transaction IDs you dole out
as in flight, since there's no possibility of data being written and thus no
need to filter transactions with that ID. Having support for this will help in
allowing more simultaneous clients.
* *Partitioned transaction manager*. Not strictly necessary, as with some
implementations this may not be necessary. However, with a global transaction
manager, the question always comes up of this becoming a bottleneck. One way
around this is to have multiple transaction managers partitioned in some manner
in which transactions would not occur across multiple partitions. With a
multi-tenant system, this may be a feasible solution.
> Make transactions pluggable in Phoenix
> --------------------------------------
>
> Key: PHOENIX-2788
> URL: https://issues.apache.org/jira/browse/PHOENIX-2788
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
>
> Given that now there's another transactional library for transactions over
> HBase in Omid that will likely be entering the incubator soon, we should
> investigate what it'll take to make our transaction support pluggable. Omid
> may not be that difficult to plugin, given that its basic approach (snapshot
> isolation) is similar to Tephra's (but of course the devil's in the details).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)