[
https://issues.apache.org/jira/browse/PHOENIX-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348277#comment-14348277
]
James Taylor commented on PHOENIX-1674:
---------------------------------------
Here are some random questions and comments about Tephra and how we might
integrate it into Phoenix, [~ghelmling]:
1. What are the options for the scope of the snapshot isolation?
a) single snapshot isolation across all statements executed in a txn
- transaction ID retrieved on first statement execution on connection.
- commits after txn ID allocated by other txns would not be seen until after
commit/rollback
b) nested: snapshot per statement with conflict resolution spanning multiple
txn IDs.
- could each stmt be executed with its own txn ID? You'd see committed data
when a stmt is run after another client had committed the data.
- conflict resolver could be told all row keys updated across all statements
(across multiple txn IDs).
- **Is this type of snapshot isolation possible in Tephra?**
2. What type of SQL isolation level supported by Tehpra?
a) write/write conflicts are clearly detected
b) what about a read of a piece of data that is then subsequently changed by
another txn before ours is committed?
c) **How would you define Tephra's behavior in terms of ANSI standard SQL
isolation levels?**
3. How would row deletion be handled?
a) Support undelete of family delete marker at HBase level.
- nothing necessary for Phoenix
b) Phoenix defines its own delete marker. This would be a known cf:cq and only
be set in this single KV for a row (Phoenix already has an "empty" key value
defined for every row to allow for rows with only row key columns). This option
is less ideal because it limits the ability of non Phoenix clients from
scanning over the data and it would confuse other HBase features such as
minVersion/maxVersion.
- Tephra would need to support a different mechanism to determine if a row
is deleted (as an empty value for Phoenix means either a null value or may just
be our empty key value for schemas in which all columns are contained in the
row key). Either a pluggable approach or perhaps a Cell tag (but note this
would be only on a single known cf:cq in a row). **Is this feasible?**
- Phoenix would need to change the Scan projection logic to always project
the empty key value cf:cq so we'd know if a row was deleted or not (which would
be a perf hit when this cf is not required for the scan).
- Phoenix has existing logic that would need to be tweaked when the empty
key value needs to be moved (if all columns in the cf are dropped). Somewhat of
a corner case.
4. Timestamp handling
a) Does Tephra manage setting the Cell timestamp transparently when using
TransactionAwareHTable and the coprocessor?
b) Would the Phoenix coprocessors coexist well with the Tephra coprocessors?
It'd like be more efficient to insert the Tephra filter *after* our skip scan
filter (which would prevent the Tephra filter from being run on a large
percentage of rows). Our skip scan filter handles point lookups. **Would it be
possible for Phoenix to order the Filters itself?**
b) The Phoenix client "locks" the timestamp by asking the RS hosting the
SYSTEM.CATALOG table for the "current server time" when it checks if the schema
for the table being queried is up-to-date.
- Instead, we'd could use the txn ID that we have for the current open txn.
We'd still need the original RPC we do to verify that we have the latest
metadata (which is somewhat unfortunate).
- UPSERT VALUES and DELETE would do the same instead of using
LATEST_TIMESTAMP
b) Another alternative might be to set the CURRENT_SCN on the connection to
match the txn ID, as that would essentially force the timestamps of mutations
to be set to the txn ID.
- Issue may arise in reading own writes (see below), as with a CURRENT_SCN,
Phoenix would not see it's own writes.
5. Reading own writes
Currently Phoenix cannot read it's own writes. With Tephra in-place we could
overcome this by:
- submitting any pending txn data
- passing our txnID and making sure Tephra doesn't filter based on this (just
remove it from the list for the filter?)
- reading at txnID+1.
> Snapshot isolation transaction support through Tephra
> -----------------------------------------------------
>
> Key: PHOENIX-1674
> URL: https://issues.apache.org/jira/browse/PHOENIX-1674
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
>
> Tephra (http://tephra.io/ and https://github.com/caskdata/tephra) is one
> option for getting transaction support in Phoenix. Let's use this JIRA to
> discuss the way in which this could be integrated along with the pros and
> cons.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)