[
https://issues.apache.org/jira/browse/PHOENIX-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349686#comment-14349686
]
Gary Helmling commented on PHOENIX-1674:
----------------------------------------
{quote}
1. What are the options for the scope of the snapshot isolation?
a) single snapshot isolation across all statements executed in a txn
{quote}
Tephra does not currently support nested transactions (there is a JIRA
https://issues.cask.co/browse/TEPHRA-39). So the only option for the moment is
for all statements within a transaction to commit together. Even with nested
transactions, I think you would still want the same snapshot to be retained
across all the nested transactions. If each nested transaction used its own
snapshot, then it sounds like you would be weakening the transactional
guarantee to "read committed", which would allow non-repeatable reads.
How do you think this would be expressed in Phoenix? Would you add support for
START TRANSACTION / BEGIN / COMMIT statements? Absent those, each statement
runs in its own transaction?
{quote}
2. What type of SQL isolation level supported by Tehpra?
a) write/write conflicts are clearly detected
b) what about a read of a piece of data that is then subsequently changed by
another txn before ours is committed?
c) *How would you define Tephra's behavior in terms of ANSI standard SQL
isolation levels?*
{quote}
As you note, snapshot isolation is subject to write-skew, where another one tx
updates bonus, while another updates comp and at the end you've violated the
constraint on comp. Using row-level conflict detection would avoid this in
some scenarios (where both updates are to the same row) by turning this into a
write-write conflict.
But, yes, snapshot isolation does not provide full serializability, though I
think Jim Gray does a better job explaining it than me.
{quote}
3. How would row deletion be handled?
{quote}
We could add a special row delete marker to Tephra, stored in a special column,
then transparently add that column to each read operation so we can check for a
row delete and handle the cell processing accordingly. Theoretically this
could be done in either Tephra or Phoenix, but I think it would make more sense
in Tephra as that is what is imposing the delete handling requirements. Yes,
this could be a performance hit if the column family storing the row delete
marker column would not otherwise have to be consulted on a read. We could do
the same as HBase and translate the row delete to a separate column family
delete marker per family, but we would still lose the optimization of the HBase
delete marker bloom filters.
Ultimately, I think it's best to ensure that HBase has the feature we need so
that we don't have to implement our own delete logic. I think this "only"
requires two issues:
* HBASE-11292 - add an "undelete" operation
* HBASE-13094 - filter that evaluates before delete markers
Though each of those may be a little hairy. Once those are in place, the
Tephra handling becomes unnecessary, but it still seems like we may need the
Tephra change as a stop gap measure. See
https://issues.cask.co/browse/TEPHRA-70
{quote}
4. Timestamp handling
a) Does Tephra manage setting the Cell timestamp transparently when using
TransactionAwareHTable and the coprocessor?
{quote}
Yes, TransactionAwareHTable sets the transaction ID as the timestamp for all
writes.
{quote}
b) Would the Phoenix coprocessors coexist well with the Tephra coprocessors?
It'd like be more efficient to insert the Tephra filter after our skip scan
filter (which would prevent the Tephra filter from being run on a large
percentage of rows). Our skip scan filter handles point lookups. *Would it be
possible for Phoenix to order the Filters itself?*
{quote}
Our coprocessor constructs a FilterList (with MUST_PASS_ALL) combining
TransactionVisibillityFilter with any other previously set filter. I think
TransactionVisibilityFilter still needs to come first so that no data from
invalid transactions is visible and that delete handling works correctly. I'm
not sure the filter APIs really allow us to enforce that though. But assuming
that is the case, looking at FilterList, and assuming that the Phoenix filter
uses SEEK_NEXT_USING_HINT as the filterKeyValue() return code, it looks to me
like we would still seek correctly from your skip scan filter. So I don't
think we would lose any efficiency here.
We do need to work through the ordering for coprocessors and filters in more
detail and understand that everything will work correctly, though.
{quote}
b) The Phoenix client "locks" the timestamp by asking the RS hosting the
SYSTEM.CATALOG table for the "current server time" when it checks if the schema
for the table being queried is up-to-date.
{quote}
I'd like to better understand how this is being used in Phoenix. Could the
query to the SYSTEM.CATALOG table just be done in the same transaction so it's
using the same snapshot for the schema state? Or is the locking here actually
a mutex on other clients?
{quote}
5. Reading own writes
Currently Phoenix cannot read it's own writes. With Tephra in-place we could
overcome this by:
{quote}
Tephra will naturally support this as long as the data from the current
transaction has been written to HBase (the timerange for reads is set up to
txID + 1, with filtering out of in-progress or invalid transactions). There
was a recent bug where this was handled in CDAP but not in Tephra on its own,
but that has been fixed.
Thanks for all the great questions so far. We've got some good discussion
points already!
> Snapshot isolation transaction support through Tephra
> -----------------------------------------------------
>
> Key: PHOENIX-1674
> URL: https://issues.apache.org/jira/browse/PHOENIX-1674
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
>
> Tephra (http://tephra.io/ and https://github.com/caskdata/tephra) is one
> option for getting transaction support in Phoenix. Let's use this JIRA to
> discuss the way in which this could be integrated along with the pros and
> cons.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)